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Analysis of the SIMPLE code for dataflow computation 


John M. Myers 


ABSTRACT 


We analyze a problem in hydrodynamics from the standpoint of 
computation on a dataflow computer that is not yet fully specified, with 
the objectives of helping to further specify the computer and helping to 
develop VAL as its source language. Lawrence Livermore Laboratory supplied 
the algorithm for hydrodynamics, including heat flow, as a 1749-line 
FORTRAN code called SIMPLE. 


The algorithm viewed as ‘abstract' (i.e. independent of physical 
arrangements in space and time for its realization) is shown to imply 
spatial and temporal structure that must appear in any and all implementa- 
tions. Both for hardware design and program compilation it is useful to 
map this structure to grosser levels of description, with the grosser 
levels reflecting modularity of computational resources conjoined with 
modularity of the algorithm. Following Holt (1979) we use role diagrams 
to display spatio-temporal structure at different descriptive levels, so 
as to guide translation into VAL as well as the analysis of the time to 
compute. 


Inter-resource communication essential to the problem is displayed, 
and various issues of machine design are defined. Using VAL with one set 
of extensions, we express the algorithm so that in principle it can be 
compiled for execution by a dataflow computer. Input-output functions 
beyond those implied by the SIMPLE code are discussed. A second set of 
extensions to VAL is advocated to express the conjunction of problem and 
resource modularity, so as to guide compilation. The dependence of time 
to compute on the number of processing units is shown for various aspects 
of the problem. 


KEYWORDS: DATAFLOW, ALGORITHM ANALYSIS, PARALLEL COMPUTATION, 
COMPUTATIONAL HYDRODYNAMICS, ROLE DIAGRAM. 
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Analysis of the SIMPLE Code for Dataflow Computation 


1. Introduction: Hydrodynamics Meets a Dataflow Computer 


The equations of physics are prescriptions for calculating; from 
some presumed starting conditions, they generate a “future”. The calculation 
of this "future" involves many events, each of which "consumes" items -- values 
of variables -- and "produces" other items. Because an item cannot be consumed 
before it is produced, these events are subject to constraints of sequencing. 
These constraints impose a pattern on the calculation. 

Although the equations of physics constrain the calculation, they 
do not fully determine it. The pattern is partly determined also by the 
method of solution employed and by the structure of the computer. Thus the 
same (partial differential) equations can result in different patterns of 
calculation, according to the method of solution and the arrangement of 
computational resources. For this reason the pattern of computation for 
a given type of problem, say hydrodynamics, evolves as methods and computational 
resources evolve. Pattern, method, and resources are coupled in their evolution, 
with each selected in part to support and to draw on the others. 

Over most of history the computer (human or machine) had only a 
sequential processing capacity, so that computation was necessarily performed 
one step after another. Thus methods which emphasize concurrency were not 
called for, and as a result are today relatively unexplored and undeveloped. 
Not only computers, but also numerical. methods have evolved in a context that 
is weighted toward the sequential, and away from the concurrent. 

Via such means as dataflow architecture (see Dennis, 1978), an 
increase in speed can be brought about by an organization of computational 
resources that allows concurrency of many events. This report is concerned 


with fitting -- or refitting -- a pattern that evolved in a sequential context 


© Die 


onto a dataflow computer. The report is based on a case study of an example 
program written in FORTRAN for a sequential machine for the solution of a 

problem of hydrodynamics, including heat Flow. This program was prepared by 
Lawrence Livermore Laboratory, and is named SIMPLE. The initially presented 


questions were: 


1.1) What is involved in translating the SIMPLE program from FORTRAN 
(suitable for a sequential computer) into a dataflow language 


(the VAL language in particular); and 


1.2) Compared to a sequential computer, what speed advantage can be 
expected from a dataflow computer in the execution of the SIMPLE 


program? 


To realize the potential advantage of a dataflow computer, its 
program must be free of unnecessary sequencing constraints. Sequencing 
constraints come from many sources, and their necessity depends on ones 
point of view. Primarily we report on the narrow view that sees sequencing 
constraints as imposed by the data dependencies of the FORTRAN program. 

In this view the "translation" per item 1.1 entails the removal of sequencing 
only as far as possible without disrupting the data dependencies expressed in 
the FORTRAN program. Such a eransiated program would be expected to produce 
numerical results identical to the FORTRAN program, apart from round-off errors. 


But the narrow view fails to: 
a) realize the potential for advances in speed, and 


b) open the physics itself to new perspectives made possible by the 


power to express concurrency. 
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Although their resolution is outside the scope of this report, we shall 
define some broader issues of solution methods, machine design, and physics. 

With respect to item a), the translated program will still contain 
unnecessary sequencing constraints, imposed by a method of solution of the 
equations of physics. For example, the back-substitution method (Crowley, 
Hendrickson and Rudy, 1978) for solving the implicit formulation of heat 
flow does not realize the potential of dataflow architecture, and it appears 
that a method could be developed that (for a dataflow computer, but not for 
a sequential computer) would be substantially faster. Thus in presenting our 
results, we shall distinguish sequencing constraints that come from the 
happenstance of the numerical method embodied in SIMPLE from constraints that 
come from less malleable sources. 

Once the method of solution is considered as variable and not fixed, 
issues of machine design surface. If methods and machine are to be developed 
in concert, it might be best to tailor the machine to a certain class of 
methods, to the detriment of its performance with methods outside that class. 
If the dataflow computer is seen as a network of interconnected processors, 
then this issue arises with respect to the communications facility that provides 
processor-to-processor communication. The problems under study stem from 
spacially distributed fields that interact in a purely local manner. From 
this locality one can show that the equations can be solved on a dataflow 
machine using a communications network which directly links only nearest 
neighbors, so that a "global" communications facility is not required. Local 
networks are cheaper and faster than global networks; however the methods 
that they support have drawbacks with respect to speed, so that the question 
of local vs. global remains open. One way of posing the issue is through 


the following question: 
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1.3) What number N' of globally connected (i.e. fully connected) 
processors have the same cost as N locally connected processors, 
under the condition that the total memory of the two configurations 


be the same? 


' The idea is that the speed loss from the restriction to local connectivity 


might be regained through the use of a larger network of processors. In other 


._ words for a given investment there is a trade-off between fewer fully connected 


processors and more Jocally connected processers. If these two contrasting 
configurations are to be evaluated in their performance on a given problem, 
then total system memory should be the same for each configuration. 

With respect to item b) it may be of theoretical interest to 
introduce a class of dataflow computers te model what is meant by the equations 
of physics. 


2. The Hydrodynamic Fields 


Given finite propagation velocities, the fields defined by the 
equations of physics can be pictured, as they were by Huygens, as networks 
of communicating entities, all operating concurrently. A partial differential 
equation represents a limit as the network becomes progressively more 
fine-grained. Computation is possible, however, only if the limit is not 
taken, or if it is "undone". 

Via one or another numerical method the partia: differential 
equations are transformed to difference equations defined on a spatial mesh 
of N zones, with each zone have corners at nodes, as shown in Fig. 1. In 


terms of the parameters defined in SIMPLE, one finds 
N = (LMX-LMN)*(KMX-KMN) . . (Eq. 2.1) 


SIMPLE employs a Lagrangian formulation, in which the mesh is deformable; 

each node is thought of as a “tagged atom", carried along in a fluid whose 
motion is described by the difference equations. By extending the discussion 
of Morse and Feshbach (1953, vol 1, p.847-8) to equations of hydrodynamics, 
one sees Huygen's principle works on a sufficiently smal] region of the mesh. 
For a given node, one can choose an enclosing curve through the zones that 
bound it, and with the result that, by interpolation, the acceleration of 

the node depends only on the properties of the zones that bound it. A similar 
argument could lead to the conclusion that the current properties of a zone 
depend only on past poperties of the nodes at its corners, but SIMPLE is 

based on a variation of this argument. Properties such as pressure and density 
are defined only for zones and not for nodes, and the current properties of 

a zone are shown to depend on their past values together with the current 


deformation of the zone, along with the current rate of deformation of the zone. 
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Figure 1: Nodes (shown as heavy dots) and zones (enclosed by dotted lines). 
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The main fields are defined by Crowley, Hendrickson, and Rudy (1978) 


as follows: 
Zonal 
name in 
Field FORTRAN. Definition 
€ E energy per unit mass 
p P pressure 
q Q artificial viscosity 
e RHO density 
8 TEMP temperature 
T specific volume 
K 


thermal conductivity . 


In addition the positions and velocities of the nodes form a field as a function 
of node indices k and 1: 
Nodal 


name in 
Field FORTRAN Definition 


<b 


Rez position as function of k,1 


UW velocity as function of k,1. 


sb 


The field equations are 


dé = - (peq)SE + FUR VB (Eq. 2.2) 
@=0(C,€) (Eq. 2.3) 
K = K(@) (Eq. 2.4) 
q=q(e,4U, €) (Eq. 2.5) 

og (Eq. 2.6) 


(fg. 237) 
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3. Communications and the Speed and Configuration of a Dataflow Computer 


3.1. General issues 

Because the least familiar aspect of a dataflow computer is its 
communications facility, we give a preliminary statement of issues of speed 
and machine design posed by the burdens that the SIMPLE problem will place 
on such a facility. 

A computational algorithm, such as the FORTRAN program of SIMPLE, 
defines a flow of data values into and out of arithmetic operations. By 
analyzing this flow, one can produce a dataflow graph that displays not 
only the concurrency that is allowable within the confines of the algorithm, 
but also an abstract pattern of communication. For the SIMPLE problem, most 
of the dataflow graph can be modularized onto regions corresponding to the 
mesh of Fig. 1: one region for each zone, and one for each node. 

To perform the computation, resources are required: physical actors 
must be provided to carry and transform the values that are specified by the 
dataflow graph. The correspondence between physical actor and role as 
value carrier is in part subjective, and inescapably so. There is no sure 
rule for the "right way" to establish the correspondence, although there are 
criteria by which to exclude many "wrong ways": wrong ways lead to failure 
(e.g. of performance or of budget). In the light of currently well developed 
technology, we may start by assigning a physical processor to each nodal and 
zonal region of the dataflow graph. If each such processor comes with attached 
memory, then a dataflow computer can consist of a set of processors together 
with a communications facility that links them. 

Affordable communications facilities never offer the full measure 
of speed, bandwidth, freedom from blocking, and other properties that it 


would be "nice" to have. Compromise is necessary. The determination of an 
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economic configuration is outside the scope of this work, but to help prepare 
the ground, we consider the message patterns that are generated by the SIMPLE 
program. All of the communications facilities under consideration could 
handle all of these patterns, but different facilities will exhibit different 
speeds for different patterns. Thus it is helpful to find out what patterns 
really matter. 
The burden placed by a dataflow graph on the communications facility 

depends on: 

.1 the connectivity of the dataflow graph -- how "scrambled" are 


the needed connections; 


.2 the number and accuracy of the field variables to be transmitted. 


A given dataflow computer can compute a dataflow graph corresponding 
to a square mesh of D zones without having to time-share its hardware (as would 
a sequential computer). Thus D measures the largest mesh that a given 
dataflow computer can handle in some "fully concurrent" manner. If D is 
to be increased, then additional hardware must be incorporated into the 
dataflow computer. In many cases of interest one expects to find N>> D, so 
that each processor will have to be time-shared among N/D regions. The 


burden on the communications facility will thus also be influenced by: 


.3 the way in which resources are time-shared over different regions 


of the dataflow graph. 


Item .2 affects only the size of the messages to be transmitted and will not 
be further considered here. Items .1 and .3 affect the "from-where-to-where" 


aspect of the communications burden, and we now discuss them further. 
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3.2. Connectivity in the face of resource sharing 


By means of a role diagram, further explained in Appendix A, 

Figure 2 illustrates the connectivity exhibited by the main cycle of a problem 
like SIMPLE, but reduced to one space dimension and stripped of heat flow. 
Figure 2 can be read as a marked graph over which tokens are moved to simulate 
the occurrence of calculational activities; the top row of circles are viewed 
as initially marked with tokens. A horizontally connected row of boxes 

( O12) is a calculational activity. The inputs to an activity 
arrive from above; the outputs depart below -- in other words the "flow of 
time" is downward. Boxes connected by double bars ( {= =(1) produce identical 
copies of the same output value, and thus portray fanout. The figure is thought 
of as wrapped around a cyclinder, with each bottom circle "wrapped up" to | 
coincide with the circle directly above it, so that a cycle is defined. 

The diagram is to be interpreted not just as an abstract flow of 
values, but as a flow of values carried by physical actors. Each vertical 
line in Fig. 2 requires a physical resource, like a processor or a buffer, 
that carries a value from one calculational activity to another. Each hor- 
izonat] row likewise specifies a physical requirement -- e.g. for the 
processing resources needed if the indicated values are to meet and be trans- 
formed. The diagram of Fig. 2 looks similar to a dataflow graph because it 
assumes no constraints due to any scarcity of resources: it assumes that 
processors and communications links are provided in abundance, at least at 
the level of detail portrayed. Resource constraints would change the picture; 
for example, Fig. 3 shows the same values as they would flow under additional 
constraints imposed by a scarcity of processors such that each processor 


must handle two adjoining activities. 
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Figure 3: Constraints on concurrency (heavy lines) imposed by sharing of processors. 
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Figure 4: Grosser view of Fig. 3 highlighting connectivity between processors; 
(compare with Fig. 2). 
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Alternate view of Fig. 4 using the notation of buffered communication. 


Figure 5: 
(See Appendix A, Sec. A.19 for more on the notation. ) 
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The suggested assignment of one processor to one nodal or zonal 
region of the dataflow graph was in some degree arbitrary. Given a smal] 
mesh and many processors, concurrency might be enhanced by assiging more than 
one processor to each such region. For a mesh large compared to the number 
of available processors, each processor would have to be assigned a larger 
piece of the dataflow graph. A question then arises: under this circumstance 
does simplicity in the connectivity of the dataflow graph imply that simplicity 
can be maintained in the connectivity of the processors? The answer depends 
on how a single processor is assigned to cover more than one region. Figure 
3 illustrates the principle that such assignment can be made so that the 
connectivity between processors is no more complex than is the connectivity 
between nodal and global regions. Figure 4 highlights this connectivity 
among shared processors; the same connectivity can be maintained when processors 
are shared over larger regions of the dataflow graph. By use of the 
abbreviated notation described in Sec. A.19 of Appendix A, Fig. 5 shows the 
same connectivity as Fig. 4, but with the communications buffers (the unlabeled 
roles) suppressed. A slanting bar implies: a) that the lower of the activities 
consumes something produced by the upper activity; and b) that the two 
activities are linked by an intermediating resource (such as a buffer) that 
is not explicitly shown. 

What can we learn from this example that is more generally applicable? 
Sharing of processors reduces the size of the communications facility required 
of a dataflow computer, at the cost of speed. For this example and this 
manner of assigning processors, the communication pattern, although becoming 
smaller, preserves its connectivity; be it one or many regions of dataflow 
graph per processor, each processor communicates only with itself and with its 


nearest neighbors. In the SIMPLE problem one finds somewhat more complex 


ke ee 
more connectivity in the dataflow graph. Two points are to be noted in the 


assignment of processors to pieces of dataflow graph of SIMPLE. 


.3. A mesh of N zones can be parcelled out to D processors in such a 
way that the connectivity among processors preserves any "localness" 
present in the connectivity among nodal and zonal regions of the 


dataflow graph. 


-4. Other schemes of assigning processors that place additional demands 


on their connectivity may offer advantages in speed. 


Because of item .3 we can learn what connectivity is necessary to D processors 
of a dataflow computer that is to solve a mesh of N zones, merely by studying 
the connectivity of the dataflow graph. Because of item .4 we must bear in 
mind that there will be additional questions of trade-offs between speed, 


cost, and the connectivity of the communications facility. 
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3.3. Fitting the Computation to the Minds of the Analysts: Input and Output 


Programs and parameters flow into a pattern of computation, and 
significant features of the computation flow out. In some cases this interaction 
can be partitioned into a sequence of phases: input, computation, output. 
However, as the size of the computation increases there is progressively 
more need to operate interactively, so that the selectivity of what flows 
out can be increased along with the amount of computation. 

Output from a dataflow machine is apt to involve transforming 
an array, or some feature (such as a contour) extracted from it, into a 
sequence of characters to be transmitted -- either to a person or to a 
storage device. Such operations are bandwidth limited and threaten to 
demand excessive time or buffering or both. As the scale of computation is 
increased, it becomes necessary to increase the selectivity of feature 
extraction in near proportion. 

One reason that extracting features is challenging is that what is 
significant sometimes becomes apparent only as the computation unfolds, so 
that the definer of significance must interact with the computation. Further, 
significance varies according to the viewer. Because of this "vaporous" 
quality, one approach is to report out "all the data" from a computation, 
so that it forms a database that can later be manipulated according to taste. 
As the scale of computation increases, this approach becomes progressively 
more demanding, and may become unrealizable. 

An alternative approach would be to provide a facility by which 
multiple viewers of the computation could each construct filters and other 
“feature extractors" in real time as the computation proceeds. No doubt 
some users would still build “databases", but they would have the opportunity 


(and perhaps the necessity) of building more selectively than has been the 
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common practice. 

This approach generates requrirements to be met by dataflow hardware 
and software. The image is of a controllable "funnel" or "tree" that sucks 
up arrays of field variables as the computation proceeds, discards what is 
irrelevant, and issues a stream of characters that conveys the features 
specified by one or another analyst. The "specification of relevant features" 
could by supplied prior to execution, or could be supplied interactively 
by the analyst as the computation unfolds. 

Such a scheme demands software interfaces that can accept analyst- 
supplied specifications of the features to be selected. Presumably the 
structure should accomodate multiple analysts. The hardware requirements 
are an extension of those already generated by the needs to sum over an 
array and to convert an array into an output stream for transmission over 
a single communications line. For example, program-controlled merging 
of array elements into a stream can provide efficient sorting. Just as they 
are needed to sum and to report out all the elements of an array, tree 
structures will be needed to report out selected elements of an array (such 
as the elements of a contour). However, one expects an advantage from more 
flexible control of tree connectivity and of tree, nodal and zonal processing 


than would be needed just to solve the field equations. 
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A. Modeling the Time to Compute 

The prediction of execution time of SIMPLE on a dataflow computer 
that is not yet fully specified is a complex task which, in this report, 
can be started but not completed. For this reason we separate a general 


discussion of what needs to be undertaken from a sketch of initial results. 


4.1. Choosing an appropriate form of model 

The question of time to compute is a question of what happens when 
an abstract pattern -- the algorithm of SIMPLE -- meets a configuration of 
physical resources -- communications lines, switches, buffers, processors, etc. 
that compose a dataflow computer. The modeling of computation time entails 
the modeling of the joining of the abstract event of the algorithm with the 
physical event of the configuration. This calls for a modeling form that 
straddles abstract (i.e. input-output) relations and physical circumstances. 
For example, we are forced to observe that anything that is (even a value) 
must be some place, such as on a communications line, in a buffer, etc. 
We must learn to see something like a dataflow graph as having, in addition 
to its implications for abstract values, implications concerning the resources 
required to support the logical operations on values. As a foundation for 
this shift in view, we turn to Holt's (1979) concept of the role played by an 
actor who carries a value. The value is in the domain of mathematics and 
algorithms; the actor (human or mechanical) is in the domain of space and time. 

It would be advantageous to have a gross model with only a few 
parameters, both to estimate the time for a dataflow computer to solve the 
SIMPLE problem, and to help in configuring an implementation of a dataflow 
computer. However, a believable gross model of such a complex situation 


can be derived only by condensing a model that encompasses sufficient 
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complexity to account, for example, for the effects of pipe-lining and 
of communications bottlenecks. It thus appears that the modeling form 
should lend itself to different levels of detail. 

The modeling method must encompass the concurrency exhibited by 
dataflow architecture. This requirement rules out models based on the concept 
of a system state, and directs toward models based on Petri nets. 

| The modeling scheme must provide for the modeling of different 
methods of numerical solution. For example, the implicit formulation of 
heat flow results in a difference equation, the solution of which is equivalent 
to the inversion of a certain near-diagonal matrix. The method of inversion 
used in SIMPLE is that of back-substitution. However, it appears possible to 
develop an alternative method that would impose far fewer unnecessary 
sequencing constraints, and would hence better realize the potential advantage 
of dataflow architecture. 

The SIMPLE program uses a global determination of a time step that 
varies from one cycle to another, but is invariant over the mesh. It appears 
that in the computation of hydrodynamic shock, there would be a substantial 
advantage in providing for the local determination of time steps that would 
vary not only from cycle to cycle, but also from location to location over the 
mesh. Such methods are used in the calculation of gravitational fields and 
in relativistic fluid dynamics, as is discussed by Misner, Thorne and Wheeler 
(1970, Chap. 42). Although this extension of method is outside the scope of 
our present work, we require that the modeling method encompass time steps 
as local values derived on an even footing with other field quantities. 

These requirements suggest modeling based on the concept of a 
Petri net. Because of its capacity to join abstract and physical operations, 
we choose the modeling scheme of Holt (1979) to express the essential logical 


and physical dependencies. For a discussion of the concepts, the reader is 


ee 


referred to the cited report of Holt. As a “quick and dirty" view of "how 


to do it", Appendix A describes the modeling conventions. 


4.2. The need for speed 


Faster computers are desired to allow a finer grained mesh. 
Consider a given physical domain and a given duration of hydrodynamic 
interaction. As the mesh is made finer the number of zones, N, increases, 
and moreover the physical time step achievable in a cycle of computation 
decreases as 1/VN. Therefore the time to compute increases as yl? 
This dependence applies to a dataflow computer with D << N, just as it does 
to a sequential computer. 

To decrease the linear dimension of the zones by a factor of 10, 
N must increase by a factor of 100, and to maintain a fixed time to compute, 
given the necessary decrease in physical time dten; the speed of the computer 
must be raised by a factor of 1000. 

One should not that the constant of proportionality that relates 
the allowable physical time step to 1/)/N depends on the numerical method 
used, and that the freedom to choose an advantageous method depends on the 
connectivity of provided by the communications facility of the dataflow 
computer. Richer (e.g. more than nearest-neighbor) connectivity supports 
larger time steps, but then richer connectivity “jos the computer and requires 
an investment that could otherwise buy more processors; thus there is a 


trade off. 
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4.3. The computational cycle 


The SIMPLE computation consists of initialization followed by 
repeated execution of a main cycle. A cycle consists of computing the 
velocity and position of each node, and then computing the properties (such 
as pressure and density) of each zone. The cycle involves times in two senses: 
a physical time step (e.g DTNPH in SIMPLE); and a time to compute the cycle. 
Because the initialization is done once and the cycle is repeated many times, 
the (total) time of computation is nearly independent of the time to initialize 
the computation, and is essentially the time to compute a cycle multiplied by 
the number of cycles. 

The computational cycle can be partitioned either in terms of 
the physics or in terms of the concurrency and connectivity that it presents. 
These two partitionings result in somewhat different pictures. The following 


is a compromise between the two. We view the cycle as composed of the 


following phases of activity: 


.1. establish boundary values (by means of "ghost" nodes and zones); 
.2. calculate velocity and position of interior nodes; 


.3. calculate zone variables for interior zones (e.g. pressure, 
specific energy, artificial viscosity, density) except for 


temperature; 


-4. calculate temperature and recalculate energy to include the 


effect of heat flow; 
.5. calculate the time step for the next cycle; 


.6. calculate totals: work done on boundary, energy lost, etc. 
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.7 extract needed output and bring in parameters to control 
subsequent output, as discussed in Sec. 4.5. 


Figure 6 schematically displays the types of connectivity, and 
hence concurrency, in the flow of data prescribed by SIMPLE over a network 
of processors, with one processor assigned to each node and each zone of 
the dataflow graph. Additional processors are assumed to handle the "tree" 
connectivity of phases 5, 6 and 7. As noted in Sec. 3, if fewer processors 
are available, they can still be connected with the same connectivity, by 
assigning each processor a set of contiguous zones, contiguous nodes, or 
portion of the "tree". If more processors are available, then more than one 
can be assigned to a given nodal or zonal region of the dataflow graph, with 
the result that a higher degree of parallelism will be achieved. Some 


possible assignments of this type are illustrated in Appendix B. 


ghost 
node 
Phase 1: Establish 
boundary values via 7 
ghost nodes and 
zones (typical row 
or column). i 
Phase 2: Calculate 


velocity and posi- 
tion of interior 
nodes (typical row 


or column). 6.0) 
Phase 3: Calculate 
zonal values except 


temperature (typical 
row or column). 


Phase 4: Calculate temp- 
erature and correct energy: 


calculate CBB and DBB 
(typical row or column); 


Z-sweep 
(typical column, all 
columns in parallel); 


R-sweep 
(typical row, all 
rows in parallel); 
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Phase 5: Calculate next time | 
step and distribute ("tree" 
connectivity covers all zones): 


calculate locally, 
then take minimum; 


distribute. 


LG 


Phase 6: Calculate total 
internal energy and energy 
exchange across boundary 
("tree" connectivity 

covers all zones; see note a.) 
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Phase 7: Input/output: 


test values (e.g. against 
thresholds) and extract 
features (see Note b.) 


from 


receive changes in param- 
eters (e.g. thresholds) 
that control feature 
extraction. (See Note 

b and Secs. 3.3 & 4.5.) 


Note a: Phase 6 consists of a local calculation, like phase 3, followed 
by a summing operation. In SIMPLE this phase is distributed 
throughout the other phases; however, this distribution does not 


anallyst 2. 


change the character of the demand placed on computational resources. 


\v 
Note b: The dotted box CF) will involve sequencing (NT ) or not ec as 
according to whether messages are or are not concatenated. 


Figure 6: Concurrency and connectivity in different phases of the cycle. 
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4.4. Dependence of time to compute on number of zones and number of processors 


Although not attempting quantitative estimates, we discuss how the 
time to compute varies with the size of the mesh and the number of processors. 
Each phase of SIMPLE, as shown in Fig. 6, will be considered separately, 
as different phases exhibit different dependencies. Several areas of 
uncertainty confront evenqualitative estimation; in particular the detailed 
Operation of a communications facility necessary to a dataflow computer 
bears on the dependence. This operation has not been modeled to date; for 
this reason we confine our discussion to two limiting cases. The first case 
leans toward keeping the communications facility local; i.e. communications 
between nearest neighbors are stressed. The second case posits a general 
purpose, global communications facility without worrying about its realizability; 
the intent is to see what contribution to speed such a facility could make if 


it were available. 


4.4.1, Case definitions 
Case 1: connectivity restricted to nearest neighbor plus "tree". As case l 
we posit a restricted communications facility. We imagine processors 
connected like a two dimensional mesh, with a provision for two-way communications 
between each zonal processor and its neighboring nodal processors. I.e. the 
processors are divided into two classes, and a given direct communication is 
always between two members that are in different classes. Fig. 7 illustrates 
the connectivity. In addition, we posit additional processors and connections 
to perform such functions as global sums and the taking of maxima. Each zonal 
processor is imagined to be a twig of a tree. At nodes of the tree there are 
processors of a third class (the "tree" class) which can operate to 

a) accept a flow of values from twig to root, operating by program to 


select and pass on the largest value, to sum the incoming values and 
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pass on the sum, etc, or 


b) accept a value flowing from root to twig, providing either for 


fanout to all zones or for selective routing to a given zone. 


For simplicity we imagine that the mesh of the SIMPLE problem 
is roughly square, and that the D zonal processors are arranged in a 
square array. To use the configuration of case 1, we imagine that each 
zonal processor is assigned about N/D contiguous zones; i.e. each zonal 
processor operates on a "Super"-zone of the mesh, as discussed in Sec. 3. 
As indicated in Sec. 3., the connectivity between super-zones (and the 
corresponding super-nodes) will show the same pattern as does Fig. 6. 
The assignment of pieces of dataflow graph to processors is static, and does 


not change during execution of the program. 


Case 2: "general-purpose" communication. Suppose that the dataflow computer 


has a communications facility that is ideal in the sense that each processor 
can send a message to any other, with a rate of flow constrained only by 
the bandwidth of the processors. We define parameters as follows: 

T, = time for a processor assigned to a node or zone of the dataflow 


graph of SIMPLE to enter a communication into the communications 
facility, for forwarding to another processor; and 


T, (D) = time for the communication, under the loading conditions at hand, 
to travel to its destination. 
T, must increase with D at least logarithmically; in practical terms it will 
probably grow more or less linearly. 
The assignment of processors to portions of the dataflow graph 
can be made as in case 1, but, as will be discussed below, there is an 


advantage in speed if processors can be reassigned during execution. In 


eh G. ie 
particular, during the Z-sweep of phase 4 it is an savantage to have each 
zonal processor assigned to a column of zones of the dataflow graph; during 
the R-sweep it jis an sdvantage to have each zonal processor assigned to 


a row of zones of the dataflow graph. 


4.4.2. Results 
Consider the SIMPLE problem for a mesh of N zones, running on a dataflow 
computer capable of computing a mesh of D zones without time-sharing of 
hardware. The running time will depend on the time to compute a cycle, 
as discussed previously. The time to compute a cycle will be a function 
of Nand D. Examination of the connectivity shown in Fig. 6 for various 
phases of the cycle leads to the results shown in Table 1. In Table 1 


the parameters Ty through T. will be different for the two cases, and indeed 


7 
depend on details of the implementation. However, they do not depend 
substantially on N or D. 

In order to move to a quantitative estimate, one must both estimate 


the parameters T, through Ty for whatever detailed cases are to be judged, 


I 
and one must also determine the degree to which pipelining could make the 
total cycle time less than the sum of the times for the individual phases. 

Although the values of the T-parameters may vary between case 1 and 
case 2, it is to be noted that the dependence on N and D is of the same form 
for the two computers, except in phase 4, where the configuration of case 2 
promises a substantial advantage. This advantage could be obtained as follows. 
Assume for simplicity that N = pe and that the mesh is square, so that there 
is one processor for each row of zones and for each row of nodes, or alternatively, 


one processor for each column of zones and for each column of nodes. For 


the Z-sweep assign each processor to a column, so that one processor must 
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operate sequentially along its column. Because of the data dependence of 
the back-substitution method used, this involves no more computing time than 
would the assignment of one processor per zone and node. At the completion 
of the Z-sweep, reassign each processor to a row, in preparation for the 
R-sweep. In this reassignment each processor must send and receive field 
variables to and from all the other processors of its class. If the 
communications facility accepts messages as fast as the processors can stuff 


them in, then we find that the time to reassign is about as follows: 
Reassignment time = DT, +T,(D) P . (Eq. 4.1) 


Table 1, under Phase 4, shows the comparison of dependencies achieved 
with this capability, versus the simpler facility offered in case 1. (Note 


that Ty for case 1 is not the same as T, for case 2.) It is to be noted that 


4 
the advantage of the more general communications facility can be realized only 
if the facility supports "high bandwidth" in the sense of providing for complete 
exchange of messages among all processors. This total exchange must actually 
take place to make the scheme work. 

The square-root dependence shown for case 1 comes about because in 
a square array of processors with processing constrained to be sequential 
along a column (for example), then only one row of processors is in parallel; 


the other rows are waiting. As D is increased, the length of the row of 


processors grows as the square root of D. 


Phase 1: | Phases 2 & 3: Phase 4: calculate temperature Phase 5 Phase 6: Phase a 
boundary | calculate time energy input/ 
value nodes and Case 1: Case 2: step. totals. output. 
determi- | zones except communications "general purpose, 
nation. temperature. restricted to global" communi- 
nearest neighbor | cations facility. 
and "tree". 
Numerical 
method of 
SIMPLE T,VN/D (Tp+T3) 2 TWD TZN; but 
reducible 
Change to to something 
hi 
hypothetical apenceenle 
concurrent T_loggN 
method for 7 : 
matrix throug 
inversion increasing 
selectivity ! 
of feature ws 
Additional si N extraction. 
change to 
local deter- 5D See Secs. 
mination of 3.3 and 
time steps 4.5. 
Notes. a) Communication more general than "tree" + "nearest neighbor", even if available, can be effectively 


used only in phase 4. 


b) The issues of estimating the parameters T, through 1 is discussed in Sec. 4.4. 


1 
c) Phases 1 through 6 may overlap, so that, as discussed in Sec. 4.4, the cycle time may be less 
than their sum; in particular the results of phase 6 are not used in any loop calculation and phase 
6 can thus easily be pipelined. 


d) The mesh is assumed to be roughly square. 


Table 1: Form of dependence of time to compute a cycle on number of zones (N) and number of 


processors (D). 
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4.5. Input, Output, and Control Over the Extraction of Features 


For the first six phases of Table 1, the time to compute diminishes 
as the number of processors is increased. But this is not so for phase 7: 
SIMPLE requires the "wholesale" shipment of arrays to an external storage 
medium. As discussed in Sec. 3.3, the time to transmit N elements over a 
single transmission line has a lower bound that is proportional to N, and 
moreover is independent of how many processors are brought into the dataflow 
computer. Thus the generation of output threatens to consume a time that 
could become excessive. This threat can be countered by providing greater 
selectivity in reporting; i.e. one programs for the reporting only of 
significant features, and avoids communicating "masses of raw data”. 

In order to avoid swamping analysts even with present computers, 
Livermore Laboratory has assembled a powerful facility for computerized 
extractions of significant features from masses of data. At present 
the approach is to first compute a relatively "general" database, and then 
to exercise selectivity in the extraction of features. In order to make 
efficient use of a dataflow computer, one must shift to a greater emphasis 
on selectivity in generating the output which will form displays and/or 
"special purpose" databases. Without bringing selectivity into the generation 
of output, the linear growth of time to report an array with the number of 
zones is apt to dominate the computation. Even if it does not, the increase 
in size in any “general-purpose” database is a serious drawback. 

The SIMPLE code offers a small beginning in this direction in the 
option in the EDIT subroutine by which one can eliminate the reporting of 
nodes and zones that show less than a specified degree of motion. More is 


doubtless done in other programs to provide selective reporting, but still 
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more must be done as the scale of computations is increased. As a specific 
example along these lines, an analyst could specify that the value of say 
pressure be reported out for any given zone only if the pressure had changed 
by more than ten percent since the last report for that zone. Thresholds 
(e.g. the "ten percent") might be varied during execution. 

If selectivity of reporting is made to increase in near proportion 
to the number of zones, then input and output can be handled with a structure 
for which Phase 7 of Figure 6 serves as a point of departure. As discussed 
in Sec. 3.3, however, more trees and more flexible control over them would be 
of advantage. The goal of selectivity would be to keep the formation of 
output from overwhelming the analyst and from taking too long. Through 
increasing selectivity with the number of zones, one can keep the growth 
rate of the time to form the output from growing as fast as the number of 
zones; one might hope to contain it to a logaritnmic dependence. 

Further discussion is outside the scope of this work, but would 


be appropriate for a future project. 
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5. Translation of SIMPLE from FORTRAN into VAL 


5.1. The balancing of objectives 


In developing a code in any language, the following desires are 


balanced: 
-l. Express the algorithm as clearly as possible; and 
.2. Make good use of computing resources. 


In producing VAL code for a dataflow computer whose hardware is not yet 
fully specified, it would also be desirable to illuminate constraints on 


concurrency, and in particular to: 


.3. Organize the code so as to make clear which aspects of SIMPLE 


place which demands on hardware speed and connectivity; and 


-4. Extend the SIMPLE problem by sketching more of the input and 
display functions, because these functions are essential to any 
actual problem of the SIMPLE type and place demands on both 


language and hardware not made by other phases of the problem. 
In addition, since we are translating from FORTRAN, it would be desirable to: 


.5. Make VAL code that can easily be compared with the given FORTRAN 


code. 


These desires conflict in various ways, and any VAL code will reflect 
a balance between them. In support of items .1 and .3 we group variables 
into bunches (such as START) in a way that will either decrease efficiency 
or place extra burdens on compilation. The decrease in efficiency would 


take the form of sending a longer message where a shorter one would suffice; 


ee ee 


concurrency at the level of detail shown in Fig. 6 would not be affected. 

In support of items .2 and .3 we have sacrificed item .5 to the 
extent of introducing new variables (STRESS, GX, GV) that are tensors 
defined in each zone, in order to demonstrate that the connectivity demanded 
by SIMPLE in computing the acceleration of each node is only nearest-neighbor, 
in contrast to the first impression given by lines 580 throug 593 of SIMPLE (1979). 
Appendix B illustrates demands placed on hardware by various parts of the 
SIMPLE problem, as expressed in VAL. 

In support of .4 we have indicated possible extensions of the VAL 
language that seem to be needed to help with the extraction of significant 


features from an array, and with input and output in general; these are: 
a. the stream type of value for input and output; 


b. the addition of concatenate to the list of forall operations, 


so that a stream can be formed quickly from a sparse array; 


c. the addition of an asymmetric merge operation on arrays to help 
in communicating a sparce pattern of change to an array; the effect 
is that one of the two arrays to be merged supplies default values 


which are overridden by non-empty elements of the other array. 


d. a form of forall eval max that extracts the lowest index at which 
the maximum value of an array of reals is found, in addition to 


the maximum value itself. 


In support of item .5 we use the names of variables as given in the 
FORTRAN code except where different structures are introduced. 


In connection with item .1, it is to be noted that the algorithm of 
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SIMPLE evolved over decades in a process that was influenced by often 
conflicting needs for single-step accuracy, stability, and economy; for 
this reason the algorithm will not be found to show a simple structure, no 
matter how it is displayed. 

The FORTRAN code, including comments, runs some 1749 lines, and 
a complete translation into VAL would be of roughly the same size. Because 
the SIMPLE code in FORTRAN is always undergoing minor revisions, as is the 
VAL language, it seems beside the point to carry through details of translation 
that duplicate the form of translations already made. We rely on Hirshman 
(1978) and Woodruff (1979) to demonstrate that many FORTRAN passages can 
be translated efficiently into VAL; some of thes passages are referred to in 
what follows. Rather than duplicate their work, we present a more detailed 
code of the main module of the VAL program for SIMPLE, as a framework in 
which to view passages that deal with specific acitivities of computation. 
In this framework we highlight the issues that were encountered in a detailed 
review of the entire SIMPLE program, focusing on areas, notably input and 
output, that require further development of the VAL language. Our intent 
is both to show how the present edition of VAL is sufficient to translate 
most of the FORTRAN, and to show clearly certain extensions of VAL that . 
appear necessary for a complete translation, including the extension of 


SIMPLE to provide for the extraction of significant features from arrays. 
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5.2. Samples of VAL code 
5.2.1. Overall form of the VAL translation of the SIMPLE code 
As discussed by Ackerman and Dennis (1979) a VAL program consists of 
a collection of external function modules, each of which may contain internal 
function modules. One internal module cannot invoke another. We present 
the VAL code for SIMPLE as a main external function module called SIMPLE VAL, 
along with an external function JES which is a table look-up used by two 
functions internal to SIMPLE VAL; in addition some external routines presumed 
to be in a system library are used, such as sine, cosine, and square root. 
The bulk of the code will be the function modules internal to SIMPLE VAL. 
Each external function module consists of: 
header, 
type definitions, 
external function declarations (e.g. for library supplied utilities) 
internal function definitions, and 


body. 


In the code that follows there will be gaps, indicated by comments, 
such as passages that can be filled in from the work of Hirshman (1978). 
Conments will also indicate where a possible extension of the VAL language 
has been invoked to overcome one or another obstacle of the type discussed 
in Sec. 5.1. 
The program will consist of the external functions 
SIMPLE_VAL 
JES VAL 
SIN 
cos 
SQRT %square root, 


and might well be augmented by system utilities to indicate running time, etc. 
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Because certain features of SIMPLE VAL are understandable only in the context 


of JES VAL, we present JES VAL first. 


5.2.2. JES_VAL 

The FORTRAN code of SIMPLE contains a table look-up subroutine named 
JES. In SIMPLE VAL this look-up is used by two internal functions: ENERGY HYDRO 
and ENERGY HEAT. Because it is called by two internal functions, we construct 
the VAL translation of JES as a function external to SIMPLE VAL. 

JES operates on numbers and not arrays; it can be applied fully 
concurrently be each zonal processor to the elements of a given zone. 

An issue in translating is that the FORTRAN version of JES uses 
many GOTO statements, and these statements are not supported under the more 
structured philosophy of VAL. Thus the JES code must be re-expressed in 
an IF-THEN-ELSE form. In arriving at the code displayed below, it was 
very helpful to first flow chart the FORTRAN CODE. Another issue is that 
in FORTRAN, JES is employed not by calling "JES", but by calling one or another 
of the entry points IES1 and IES2; these will correspond to the parameter 
ENTER in JES VAL, our VAL equivalent of JES: ENTER = 1 corresponds to IES1; 
ENTER = 2 corresponds to IES2. 

Partly because it uses a method of successive approximations, SIMPLE 
employs JES several times in the calculation of energy for a single zone. JES 
(for ENTER=2) returns energy or (for ENTER=1) pressure as a function of 
temperature (TARG1) and density (RARG1), by means of a table lookeup. The 
table is organized as a two dimensional array of rectangular regions on 
the (temperature, density)-plane, with a region specified by a pair of 
integers NT and NR. The returned value is supplied by a procedure that 


has several steps: 
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@ Search for and find the NT, NR for the region that contains the 
"point" (TARG1, RARG1); 


@ Per line 1353, statement 5310 of SIMPLE (1979), evaluate a function 
of NT and NR to obtain an integer M as index to an array of sets 
of coefficients -- e.g. AESIM], etc. The set of coefficients 


for a found M will be used to interpolate. 


@ Obtain the value to be returned by means of a quadratic interpolation 


~ function, using the set of coefficients AESLML, etc. 


The running time of SIMPLE (at least for a sequential machine) is significantly 
reduced by saving NT, NR, and M as NTSVEN], NRSVEN], and MSVCN] for use 

as trial starting values for the search in the next invocation of JES. In 

the FORTRAN code NT (along with NR and M) is saved separately according to 
which of the two entry points (corresponding to ENTER = 1 or ENTER = 2) is 
invoked. Thus NT is saved in a two-element array, with one element for 

each possible entry point. We refer to the six saved numbers collectively 


as SV_REC, where SV_REC is a structure of type SV_REC_ type, defined by: 
type SV_REC type = record(NT, NR, M: arraylinteger]] %. 


The structure which we have called SV REC saved from a given zone 

supplies trial values for the next invocation of JES, which may be for 

the same zone, or for a different, usually neighboring zone, as the sequential 
processor steps from zone to zone. The facilitation of the search is still 
likely when a shift is made to a neighboring zone, because conditions change 
little from a zone to its neighbors. The speed advantage accrues because 


the sequential processor usually last invoked JES either for the same zone 
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or for a neighboring zone. When the last invocation was for a far-away 
zone, then SV_REC is no help; this does not affect the answer produced 
by JES, but does extend the time to find the answer. 

Now we turn to the issue of translation for a dataflow computer. 
Suppose, as suggested in Sec. 3, a dataflow computer has D zonal processors, 
each assigned to cover a “super-zone" composed of (about) N/D contiguous 
zones. When N>> D a given zonal processor will step sequentially from 
zone to zone in a "raster scan" over its N/D assigned zones, just as the 
sequential computer is specified by the SIMPLE code to scan all N zones. 


There are three options: 


a. Omit the use of SV_REC, and accept a slower look-up (noting that 
because many look-ups will be done concurrently, the speed is not 


so important as it was in the FORTRAN code). 


b. Create an array of SV_REC'’s, with one SV REC for each zone. This 
option maintains the speed, but as the cost of storing a factor 


of N/D more SV_REC's than are really needed. 


c. Cause each zonal processor to carry one SV REC along as it steps 


through its N/D zones. 


Option a) is easiest to implement, but is hardly an example of translating 
power. Option c) is both the most efficient and the most demanding, and 
is coded in Sec. 5.2.3, where it shows up in initializing SV prior to 
entering the main loop, and in Sec. 5.2.4 where it is discussed under 
ENERGY HYDRO. 


The VAL function module follows: 
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function JES VAL(ENTER: integer; TARG1, RARG1: real; SV_REC: SV REC type. 
returns real, SV_REC type) 
type SV_REC type = record(NT, NR, M: arraylinteger] ] 
let % The closing "in" is the the last line of JES VAL. 
% Set up constants for table; these are provided in the FORTRAN code by 
% subroutine SETUP acting via COMMON; we incorporate much of the equivalent 
% Of SETUP here. 
IZES, ITES, IRES: arrayLinteger] := C1: ...], ... 3 
TES, RES, AES, ... , PES: arrayLreal] := Ll: ...], ... 3 % End of set-up part. 
EXTT1, EXTR1: real := 1; 
N: integer := ENTER; % Change of name to conform to FORTRAN code 
NT, NR: integer := SV_REC.NTEN], SV_REC.NRENI; 
EXTT2: real := EXTT1 * TARG1; 
EXTT, TARG: real, FLAG, NT1: integer := 
if TES{NT] > TARG1 then 
| if NT <= ITESCNI]then EXTT2 / TESCNT], TESENT], 9, NT 
| else for Nl: integer := NT-1 
do if TESEN1] > TARG1 then 
if Nl > IESC€N] then iter N1 := N1-1 enditer 
! else EXTT2 / TESCN1), TES{N1], 1,N1 endif 
else EXTT1, TARG1, 1, N1 endif 
endfor 
| endif 
else if TESCNT+1] > TARG1 then EXTT1, TARG1, @, NT 
else if NT+2 = ITESEN+1] then EXTT2 / TES{NT+1], TESCNT+1], 9, NT 
else for Nl: integer := NT-1 


do if TESE€N1+1] > TARG1 then EXTT1, TARGI, 1, Nl 
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else if N1+2=ITESEN+1] then EXTT2 / TESEN1+1, TESLNI+1], 1, NI 
else iter Nl := Ni+1l enditer endif 
endif 
endfor 
endif 
: endif 
endif 
EXTR2: real := EXTRI * RARG1; 
EXTR, RARG: real, FLAG2, NR1: integer:= 
if FLAG=0 then 
if RESCNR1 >RARG1 then 
| if NR > IRES{INJ] then for Nl: integer := NR-1 
do if RESENR] > RARG1 then 
if NR > IRES£N] then iter N1 := N1-1 enditer 
else EXTR2 / RESENI, RESEN1], 1, Nl endif 
else EXTR1, RARG1, 1, Nl endif 
endfor 
— else EXTR2 / RESENRJ, RESENRI, @, NR endif 
else if RESCNR+1] > RARG1 then EXTR1, RARG1, 9, NI 
else if NR+2=IRESCN+1Jthen EXTR2 / RESLENR+1], RESENR+1J, 0, NR 
else for Nl: integer := NR+t1 
do if RESEN1+1] > RARG1 then EXTR1, RARG1, 1, Nl 
else if Ni+3 > IRESEN+1] then EXTR2/RESLNI+1], RES{NI+1], 1, N1 


endif 


| 
| 
| 
| 
| else iter Nl := Ni+1 enditer endif 
| 
| endfor 

i 


endif 


= AZ 2 


endif 
aise: if RESCNR] < RARG1 then for Nl: integer := NR 
do if RESLN1+1] > RARG1 then EXTR1, RARG1, 1, N1 
else if N1+3 > IRESEN+1}then EXTR2/RESCN1+1], RESENI+12, 1, N1 
else iter Nl := N1+1 enditer endif 
endif 
endfor 
else for Nl: integer := NR 
do if RES£EN1] > RARG1 then 
if Nl > IRES£NI]then iter Nl := N1-1 enditer 
else EXTR2/RESEN1], RESLN1], 1, N1 endif 
else EXTR1, RARG1, 1, Nl endif 
endfor 
endif 
endif; 
M: integer := if FLAG2=0 then SV_REC.M 
else IZESENJ+(ITESLN+1]-ITESENIJ-1)*(NR1-IRESLNIJ+NTI-ITESEN]) endif; 
SV_REC1: SV_REC_type := | 
if FLAG2=0 then SV REC 
else SV_REC replace[NT: SV_REC.NTLN: NT1; NR:SV_REC.NREN:NR1J; 
M: SV_REC.M(N:M]] endif; 
FUNC: real := AESEM1 + RARG * (BESLM] + RARG * DESIM]) 
+ TARG * (CES[M] + RARG * (FESTM] + RARG * GESLM]) 
+ TARG * (EESCM] + RARG * (HESLM] + RARG * PESLM]))); 
FUNC] : real := if ENTER=1 then FUNC * EXTT * EXTR 
else FUNC * EXTT endif 
in “closes "let" on line 2 of JES VAL 


FUNC], SV_REC1 endlet endfun % End of function JES VAL 
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5.2.3. SIMPLE VAL 
SIMPLE VAL is the main module -- i.e. the overall framework -- 
for the VAL code translation of SIMPLE. Because the functions internal 
to this module correspond to roughly 25 pages of FORTRAN code, the section 
of internal function definitions is abbreviated to a list of headers, and 
a discussion of salient features of these modules will be found in Sec. 5.2.4. 
The code that follows is a detailed statement of the overall structure 


of the VAL translation of SIMPLE. 


% Header: 
% Note presumed language extension to "stream" type for input and output. 
function SIMPLE VAL(INPUT A:  start-type; INPUT_B: stream 
[correction_type ] returns stream [out_phys type ], 
stream[out_cycle type], stream[out_edit_type], 


stream out_condition_type ) 


a~type definitions: 

type vector = record|[R,Z: real]; 

type zonal = array[array [real] ]; 

type zone_tensor = array[array[record[E,W: vector| |]; 

type nodal = array (array[vector] | 3 

type node_scalar = array[array[real]]; _ 

type start_type = record[DTNPH, TFLR, EDDT, P@, E®, RHOM, DTMIN, 
DTMAX, TMAX, COF, C1F, GAM: real; BC: record[U, D, L, R: integer]; 
LIM: record[KN, KX, LN, LX, DS: integer]; NCP: integer ]; 
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As shorthand we shall write "STATE" and “state_type" to refer to 
a list of the variables that define the state of the computation: 
state type = list[DTNPH, DTN, TNUP, ENCG, EDTIME, EDDT: real; NYCL: 
integer; P, Q, RHOJ, E, S: zonal; X, V: nodal; GX: zone_tensor; 
DTMIN, DTMAX, TMAX, COF, C1F, GAM, EDDT, TFLR: real; NCP: integer] 
type out phys type = "state_type"; 
type out_cycle type = record[NYCL: integer; DTNPH, TE, ENC, SKE, HN, WN, 
ENCG: real; DTEN, DTC2: record[DT: real; 
K, L: integer]]; 
type out_edit_type = "state_type"; 
type out _condition_type = stream; % language extension 


type correction type = stream; 


type lim_type = record| KN, KX, LN, LX, DS: integer]; % 4 fields correspond 
% to FORTRAN code KMN, KMX, LMN, LMX; DS describes implementation for 
% the implementation-dependent use of JES VAL shown in ENERGY_HYDRO. 


type SV_REC type = record[ NT, NR, M: array[ integer] ] 3 
% SV_REC discussed in Sec. 5.2.2 in connection with JES VAL. 


type SV_type = array[array[SV_REC_type]]; % Because of our choice of 
% option c) of Sec. 5.2.2, the array SV of type SV_type will have 

% dimensions of LIM.DS by LIM.DS, where LIM.DS squared is D, the 

% number of zonal processors of the dataflow computer. If option b) 

% were used, then LIM.DS would not have to appear in the program, and 
% the array SV would have N (number of zones in mesh) elements instead 


% of D elements. 


a AG 


% external function declarations: 

external JES VAL(ENTER: integer; TARG1, RARG1: real; SV_REC: 
SV_REC_type returns real, SV_REC_type) 

external sin(DUMMY: real returns real) 

external cos(DUMMY: real returns real) 


external sqrt(DUMMY: real returns real) % square root. 


% The bodies of the internal function definitions are omitted here; the 


% headers are listed for all internal functions of SIMPLE VAL: 


% 


ro 


INITIALIZE(START: start_type returns "state type") 
EDIT(STATE returns edit_type) 


BOUNDARY PROJECT(P, Q, RHOJ: zonal; X: nodal; GX: zone tensor; LIM: 


lim_type returns zonal, zonal, zonal, zone_tensor) 


VELOCITY(V: nodal; P, Q, RHOJ: zonal; GX: zone_tensor; DTN: real; 


LIM: lim_type returns nodal) 
POSITION(X,V: nodal; DTNPH: real; LIM: lim type returns nodal) 
HWORK(X, V: nodal; P, Q: zonal; DTNPH: real; LIM: lim_type returns real) 


ZONE _GEOM(X, V: nodal; MASS, S: zonal; LIM: lim_type returns 


zonal, zonal, zonal, zonal, zone tensor, zone_tensor) 


ENERGY HYDRO(E, P, Ad, RHO, DVOL, MASS: zonal; GX, GV: zone_tensor; 
SV: SV_type; DTNPH, C@F, C1F, GAM, DTMAX: real; LIM: 


lim_type returns zonal, zonal, zonal, zonal, SV type) 


% 


ra 


% 


% 


ho 
% 
% 


% 
% 


mae ae 


HYDRO TOTAL(V: nodal; MASS, E: zonal; LIM: lim type returns real, real, real) 


ENERGY HEAT(E, RHO, AJ, TEMP, MASS: zonal; X: nodal; SV: SV_type; 
DTNPH, TFLR: real; LIM: lim_type returns zonal, zonal, zonal, 
node scalar, node_ scalar, SV_type) 

HEAT_TOTAL(E, TEMP, MASS: zonal; CBB, DBB: node scalar; DTNPH, HN: real; 


LIM: lim_type returns real, real) 


TIME _STEP(TSO, YE: zonal; X: nodal; DTNPH, DTMAX, CF, C1F, GAM: real; 


LIM: lim_type returns real, real, real, real) 
PHYS REPORT("STATE": "state_type" returns “state-type") 


CYCLE REPORT(YE, TSO: zonal; NYCL: integer; TNUP, DTNPH, TE, ENC, 
SKE, HN, WN, ENCG: real; LIM: lim_type returns 


out_cycle_ type) 


MODIFY("STATE": "state_type"; DUMMY: correction_type returns 


"state-type") 


se sk Sk CSR 
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body of SIMPLE VAL 
The gross plan of the body is 
for STATE: state_type:= INITIALIZE(first(INPUT_A)); 


QUT PUT: stream:= null 


do if (condition) then QUT PUT 


else iter STATE:= main_cyle(STATE) enditer 
endif 


endfor 


In the detailed presentation that follows we split "STATE" into 
its fields (as given in the section of type definitions), and split 


“main_cycle" according to the phases illustrated in Figure 6: 


for START: start_type:= first(INPUT A); % read input stream 


STATE: "state_type":= INITIALIZE(START); 

OUT_PHYS: stream[out_phys_type]:= nul]; 

OUT_CYCLE: stream[out_cycle_type]:= null; 

OUT_EDIT: stream{out_edit_type]:= EDIT(STATE); 

CORRECTION: stream := INPUT B; 

HN, WN: real := 0.; 

% Set up temporary variables, other than those covered in STATE, 

% needed for main cycle: 

AJ, DVOL, TEMP, TSO, YE: zonal := array _fill(LIM.KN + 1, LIM.KX, 
array fill(LIM.LN + 1, LIM.LX, 0.))3 

GV: zone_tensor := array fill(LIM.KN + 1, LIM.KX, 
array_fill(LIM.LN + 1, LIM.LX, record[E, W: record[R, Z: 0.}]); 

CBB, DBB: nodal := array_fill(LIM.KN, LIM.KX, array fill 
(LIM.LN, LIM.LX, record[R,Z: 0.])); 

DTEN, DTC2, SKE, ENH, TE, ENC: real :=0. 


- 49 - 


LIM: lim type := START.LIM; 


% Set up array of SV_REC's to conform to option c) of Sec. 5.2.2. 

% Let DS be the greatest integer such that DS*DS = D, where D is the 

% number of zonal processors, as discussed in Sec. 3. 

SV: SV_type := 
let DS: integer := LIM.DS % implementation-dependent parameter. 
in array_fill(1, DS, array fil1(1, DS, record NT: array fill(1, 2, 0); 
NR: array fill(1, 2, 0); M: array_fill(1, 2, 0); EXTR: 0.)) endlet; 


do if DTNPH < DTMIN | TNUP > TMAX then 
let OUT_CONDITION: stream := 
if DTNPH < DTMIN then "DT STOP" || NYCL {| TNUP |{ DTNPH |{ DTMIN 
else "STOP TMAX" |[ NYCL {| TNUP || TMAX endif 
in QUT_PHYS, OUT CYCLE, OUT_EDIT, OUT_CONDITION endlet 
else iter 
% Phase 1 of cycle (see Fig. 6 for description of phases): 


P, Q, RHOJ, GX := BOUNDARY PROJECT (P,Q, RHOJ, X, GX, LIM); 


% Phase 2 of cycle: 
V := VELOCITY(V, P, Q, RHOJ, GX, DTN, LIM); % vector velocity 
X := POSITION(X, V, DTNPH, LIM); % vector position 


% "WN" part of Phase 6: 
WN := HWORK(X, V, P,Q, DTNPH, LIM) + WN; 


% Phase 3 of cycle: 
RHO,.AJ, DVOL, S, GX, GV := ZONE GEOM(X, V, MASS, S, LIM); 
E, P,.Q, TEMP, TSO, SV := ENERGY HYDRO(E, P, Ad, RHO, DVOL, MASS, 


GX, GV, SV, DTNPH, COF, C1F, GAM, DTMAX, LIM); 
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% Hydro part of phase 6: 
SKE, ENH, TE := HYDRO_TOTAL(V, MASS, E, LIM); 


% Phase 4 of cycle: 
E, RHOJ, YE, CBB, DBB, SV := ENERGY HEAT(E, RHO, Ad, TEMP, MASS, X, SV, 
DTNPH, TFLR, LIM); 


% Heat part of phase 6: 
ENC, HN := HEAT TOTAL(E, TEMP, MASS, CBB, DBB, DTNPH, HN, LIM); 


% Phase 5 of cycle: 
DTN, DTNPH, DTC2, DTEN := TIME_STEP(TSO, YE, X, DTNPH, DTMAX, 
COF, CIF, GAM, LIM); 


% Phase 7 of cycle (output and corrective: input): 
OUT_PHYS, EDTIME := 
if TNUP <EDTIME then OUT PHYS, EDTIME 
else OUT PHYS || PHYS _REPORT(STATE), EDTIME + EDDT endif; 
NYCL := NYCL + 1; 
OUT CYCLE := 
if MOD(NYCL, NCP)~= 0 then OUT_CYCLE 
else OUT_CYCLE || CYCLE_REPORT(NYCL, TNUP, DTNPH, YE, TSO, 
TE, ENC, SKE, HN, WN, ENCG) % Tines 766-773 of FORTRAN 
endif 7 
STATE, CORRECTION := 
if CORRECTION = null then STATE, CORRECTION 


else MODIFY(STATE, first(CORRECTION)), rest(CORRECTION) 
endif 


ae ae 


% An alternative approach to output would be to extract significant 
% features. For example, we illustrate a report of pressure for only 
% those elements of the array P that have changed by at least 10 
% percent since they were last reported. We assume an array P_LAST 
% as an iteration variable to carry the "last reported" value of P: 
PLAST, OUT PHYS SELECTIVE := 
if TNUP <EDTIME then nil %language extension for iteration variables 
else let COND: array [array| boolean] ] := 
forall K in [LIM.KN + 1, LIM.KX], L in [LIM.LN + 1, LIM.LX] 
construct  ABS((P[K,L] - P_LAST[K,L])/MAX(EPS, P_LAST[K,L] )) < .1 endal1 
in forall K in [LIM.KN + 1, LIM.KX], Lin[LIM.LN + 1, LIM.LX] 
construct if COND then P_LAST[K,L] 
else P|K,L] endif endall, OUT PHYS SELECTIVE | 
forall K in[LIM.KN + 1, LIM.KX], L in [LIM.LN + 1, LIM.LX] 
eval concatenate “Zlanguage extension 
if COND then null 
else record[P: P[K,L]; K: K; L: L_] endif endal1 
endlet 
endif 
% end of example of feature extraction 
enditer 
endfor 


endfun % SIMPLE_VAL 


Shoe 
5.2.4. Discussion of functions internal to SIMPLE_VAL 


INITIALIZE includes code tike the modules GENBC and GENPOS of Hirshman (1978), 


along with code of the form, say for pressure, 


% P: zonal := 


array fill(LIM.KN + 1, LIM.KX , array fill(LIM.LN + 1, LIM.LX, START.PQ)). 


EDIT is straightforward to translate, except for one demand which it places 
on the language: one needs to extract not only the maximum element of an 
array (as can be done with forall eval max) but also the K,L coordinates 
at which the maximum is found. Efficient support of this need requires 


hardware and language attention. 


BOUNDARY PROJECT includes the module GEOMETRY of Hirshman, the filling of 
P, Q, and RHOJ arrays (where RHOJ[K,L] = RHO[K,L] * AJ[K,L]), and the 
calculation of GX for boundary zones. The calculation of GX for interior 


zones is done in ZONE GEOM, and is discussed in Appendix B. 

VELOCITY: see Appendix B, where connectivity of the flow of data is discussed. 
POSITION is like Hirshman's module HYDRO; see also Appendix B. 

HWORK is essentially Hirshman's module of the same name. 


ZONE GEOM produces AJ and S like the module GENAREA of Hirshman, and also 


produces GX and GV by the algorithm discussed in Appendix B. 


ENERGY HYDRO contains parts like NEWE and NEWQ of Hirshman. However, 
NEWQ can be recast to use GX and GV in place of X and V, with the result 
that the calculation for a given zone draws only on values of that zone; 


j.e. no node-to-zone communication is required for the computation of Q when 


a ee 
GX and GV are made available from ZONE GEOM. 

Subroutine TEMPCAL of the FORTRAN code can be translated readily into 
a function module internal to ENERGY HYDRO. Both via TEMPCAL and directly, 
ENERGY_HYDRO calls the external function module JES VAL to compute pressure 
(from JES _VAL(1, TEMP, RHO, SV_REC)) and energy (from JES VAL(2, TEMP, RHO, SV_REC)) 
The value SV_REC supplied to JES VAL is in effect a hint where to start 
searching in a table; the value supplied does not affect the numerical results 
produced by JES VAL, but it does affect the time to execute JES VAL. 

If option b) os Sec. 5.2.2 were selected, coding into VAL would 
be easier because there the array SV would have N elements and be of the 
same shape as P, RHO, etc. For that option a typical use of JES VAL would 


be the production of a trial pressure Pl, as in: 


Pl, SV: zonal := 
forall K inf LIM.KN+1, LIM,KXJ, L in CLIM.LN+1, LIM.LX J] construct 
JES VAL(1, TEMPLK,LJ, RHO[K,LJ, SV[K,LJ) endall; %. 


Instead of using option b), we have chosen option c) as an example 
of the kind of demand on expressive power that occurs in tailoring an 
algorithm to an implementation. As discussed in Sec. 5.2.2 option c) saves 
storage by taking SV to be an array of only D (= number of zonal processors) 
elements; this can be much smaller than the N-element array used in option b). 
To express the N-element array Pl as a function of a D-element SV, it appears 
- necessary to first create a partitioned array equivalent to Pl, with a block 
of this partitioned array corresponding to an element of SV. 

The N interior zones of the mesh constitute a two-dimensional 

array of (LIM.KX - LIM.KN) by (LIM.LX - LIM.LN) elements. For simplicity 


we assume that both of these dimensions are exactly divisible by LIM.DS, 


- 54 - 
where D = (LIM.DS)2 is the number of zonal processors used, and we assume 
a physical configuration of a square array of LIM.DS by LIM.DS zonal 
processors. 

Each zonal processor is to be assigned a rectangular "super-zone" 


of the mesh, consisting of KS by LS contiguous zones, where 


KS 


(LIM. KX-LIM.KN)/LIM.DS 


and 
LS 


(LIM.LX-LIM.LN)/LIM.DS 


In place of -1 one expressed an N-element Pl in terms of a D-element SV, 
where one elemnt of SV corresponds not to one element of Pl, but rather to 
a block of KS by LS elements of Pl. Let P_BLOCK bea partitioned array 
equivalent to Pl; that is, while Pl is a 2-dimensional array of reals, 

P BLOCK is an array of LIM.DS by LIM.DS "little" arrays, each with KS by LS 
real elements, so that P_BLOCK must be a 4-dimensional array of reals. 


Option c) demands that: 
.@ computation proceed in each of the D blocks of P_BLOCK concurrently, ‘and 
e within a given block, computation proceed in a raster scan sequentially. 


The correspondence between Pl and PBLOCK is: 


P1LKI*KS + K@,L1*LS+L9] = P_BLOCK[K1,L1,K,L@1 . 


In other words, K1,L1 tell which block, and K@,L® tell which element 
within the block. It follows that (with the VAL convention for downward: 
: rounding of integer division) the [K,Llelement of Pl is given by 


4 
PIfK,LJ = P_BLOCKLK/KS, L/LS, MOD(K,KS), MOD(L,LS)]J . 


2 SG: 
The VAL code for producing Pl and SV in accordance with option c) follows: 


Pl: zonal, SV: SV_type := 
let P BLOCK: arrayfarray[array{array [real]]]] , SV1: SV_type := 
forall] K1 in €1, LIM.DS], L1 in (1, LIM.DSJ 


KS: integer := (LIM.KX-LIM.KN)/LIM.DS; % Assume exactly divisible 
(LIM.LX-LIM.LN)/LIM.DS; % " 


LS: integer : 
construct % P BLOCKEK1,L11 is itself a 2-dimensional array. 
for BLOCK: array{array[real]]:= array empty[arraylreal] ; % Element of P_ BLOCK. 
SV_REC1: SV_REC_type := SV[K1,L1]; 
K@: integer := 1 
do if K®> KS then BLOCK, SV_REC1 
else iter BLOCK, SV_REC1 := 
let BCOL: array{realJ, SV_REC2: SV_REC_type := 
for BCOL1: array[real]:= array empty[real]; 
SV_REC3 : SV_REC_type := SV REC1; 
L@: integer := 1 
do if L@ > LS then BCOL1, SV_REC3 
else iter BCOL1, SV_REC3 := 
let P_EL: real, SV_REC4: SV REC type := 
JES VAL(1, TEMPLE K1*KS+K@, L1*LS+L9], 
RHOE K1*KS+KM, L1*LS+L@], SV_REC3) 
in BCOLIELP: P_ELJ, SV_REC4 endlet; 
LO :=LP+1 
enditer 
endif 
endfor 


in BLOCK K®: BCOL , SV _REC2 endlet; 


KO := KO + 1; 
enditer 
endif 

endfor 
endal] 
in % Pl: zonal, SV: SV_type := 

forall K in CLIM.KN+1,LIM.KX], L in CLIM.LN+1, LIM.LX] construct 

P_BLOCKLK/KS, L/LS, MOD(K,KS), MOD(L,LS)J], SV1 


endlet % Completes production of Pl and SV. 


Because of the explicit reference to LIM.DS, a parameter of 
implementation, this example gives a glimpse of the type of expression 
needed when a programmer assists in compilation. It is generally recognized 
that hardware can be used more effectively if the programmer tailors the 
program to it. In simple cases one hopes that the algorithm will not have 
to be changed to effect such tailoring, but we have just seen a case in 
which the algorithm (though not its numerical result) did change. To 
facilitate compilation of the whole SIMPLE code, one might well express 
all the arrays in blocked (i.e. partitioned) form for internal computation. 
If this were done then the conversion to 2-dimensional form would not 
be done as part of the above example, but would be deferred to the 


generation of output, as in the module PHYS REPORT of SIMPLE VAL. 


HYDRO_TOTAL, like HWORK, is straightforward, being essentially the 


execise of the construct forall-eval-plus. 


ENERGY HEAT is the main bottleneck in the SIMPLE problem, because of 
the sequencing constraints due to the back~substitution method chosen 


for solving for heat flow. The sequencing constraints are illustrated 
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in Appendix B, Fig. B.1. The constraints are in the "R-sweep" and "Z-sweep" 
portions of subroutine CONDUCT of the FORTRAN code of SIMPLE. This code 
steps from one element of an array to another, using results of a previous 
element to calculate a next element. 

Subroutine CONDUCT saves TEMP as TS in line 1586, and then restores 
TEMP to TS in line 1673, so that after the execution of CONDUCT, TEMP is 
unchanged; what is calculated is really a temporary variable which we call 
TEMP1 in the code below. Its use is not to get a new TEMP, but rather to 
help in adjusting E to account for heat flow. The FORTRAN code partially 
inializes arrays A and B outside of the sweeps; we incorporate this initial- 
ization into the sweeps. The VAL arrays CBB and DBB are like those of 
the FORTRAN code, but re-indexed to clarify the connectivity actually 
required (see note be following the VAL code below). The production of 
TEMP1 in the VAL code for ENERGY HEAT would then appear inside a LET construct 


as follows:- 


% Z-sweep (per line 1612 of the FORTRAN code of subroutine CONDUCT) 
TEMP1: zonal := tet TEMP2: zonal := % Z-sweep calculates TEMP2 
forall K in [LIM.KN + 1, LIM.KX] construct 
Tet A, B: array[real]:= % range over L 
for L: integer := LIM.LN +1; 
ACOL, BCOL: array[real]:= array fil1(LIM.LN, LIM.LX, 0.), TEMP[K] 
do if L > LIM.LX then ACOL, BCOL 
else let DUM1: real := SIG[K,L] + CBB[K,L] + CBB[K,L-1] * (1 - ACOL[{L-1]) 
in iter ACOL, BCOL := ACOL[L: cBB[K,L]/ DuM1], BCOL[L: sIG[k,L] * 
TEMP[K,L] + CBB[K,L-1]* B[K,L-1] /DUM1]; 
L := Lt+l 


enditer endlet endif enafor 
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% ... ALPHA, BETA FORWARD 
in for L: integer := LIM.LX; TCOL: array[real] := TEMP[K] 
do if L <LIM.LN + 1 then TCOL 
else iter TCOL := TCOL[L: A[L] * TCOL[L+1] + B[L]]; L := L-1 enditer 
endif endfor endiet endall % end of Z sweep; returns TEMP2 
in % Feed TEMP2 through R-sweep to produce TEMP1: 
% R sweep 
let A, B: array[array[real]] := 
for K: integer := LIM.KN + 1; A2D, B2D: array[array[real]] := 
array fill(LIM.KN, LIM.KX, array fil1(LIM.LN, LIM.LX, 0.)), TEMP2 
do if K > LIM.KX then A2D, B2D 
else let ACOL, BCOL: array[real] := 
forall L in [LIM.LN + 1, LIM.LX ] DUM1: real := SIG[K,L] 
+ DBB[K,L] + DBB[K-1,L] * (1- A2D[K-1,L]) 
construct DBB[K,L] / DUM1, SIG[K,L] * TEMP2[K,L] + 
DBB[K-1,L] * B2D[K-1,L] / DUM 
endall 
in iter A2D, B2D := A2D[K: ACOL], B2D[K, BCOL]; 
enditer endlet endif endfor 
% ALPHA, BETA FORWARD SWEEP 
in for K: integer := LIM.KX; T2D: array[array[real] | := TEMP2 
do if K <LIM.KN + 1 then T2D 
else iter T2D := T2D[k: 
forall L in [LIM.LN + 1, LIM.LX ] 
construct A[K,L] * T2D[kK+1,L] + B[K,L] 
endallk K := K-1 


enditer endif endfor endlet endlet % Returns TEMP1 


Notes: 
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In VAL the syntax for operating on a two-dimensional array with 
a forall construct over one index and a for-iter over the other 
index is different according to which index is’ subjected to which 
construct. For this reason the Z-sweep and the R-sweep, which 


look much the same in FORTRAN, look different in VAL. 


The FORTRAN code uses an awkward convention in indexing CBB and DBB, 
with the result that there appears to be more coupling of array 
elements than is in fact the case; to clarify this we write 

CBB[K,L | in place of what in the FORTRAN code would be written 
CBB[K-1,L]; similarly we write DBB[K,L]in place of DBB[K,L-1]. 


In FORTRAN only one edge of the array A is initialized prior to 
the loop; in VAL it was convenient to initialize the whole array. 
The VAL code re-initializes A in the R-sweep. This is permissible 
because although the A array is operated on in the Z-sweep, the 


only column that matters (i.e. LIM.KN) is not changed in the Z-sweep. 


HEAT_TOTAL uses forall eval plus. 


TIME_STEP combines Hirshman's module TINCR with the calculation of DTEN, 


which in the FORTRAN is done in subroutine CONDUCT. Calculation of KC, LC, 


KEN, and LEN is not done in TIME STEP, but is deferred to CYCLE REPORT. 


PHYS REPORT is similar to EDIT. 


CYCLE_REPORT is straightforward except for needing the coordinates of an 


array where a maximum or minimum value is found, as was the case with EDIT. 
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MODIFY is an augmentation of SIMPLE to allow for real-time interaction with 
an analyst; e.g. MODIFY is to provide for receiving a change in say DTMAX, 
or even for receiving an entire "STATE", as would be needed to restart 


the computation after an analytic "catastrophe". 


= 6ic= 


6. Conclusions and Possible Next Steps 
6.13 Speed, input-output, and expression of the abstract algorithm 

As shown in Table 1, except for outputting results, the application 
of D processors configured as a dataflow computer can reduce the execution 
time of the SIMPLE code by a factor of at least D%. The sequencing constraints 
that limit improvement to this factor occur in the calculation of heat flow, 
as illustrated in Fig. 6. These constraints stem from the method chosen 
in the SIMPLE code for the inversion of a tri-diagonal matrix: back-substitution. 
It would appear feasible to find or develop a method with weaker sequencing 
constraints. If this were done, then all phases of the program, except 
output, would execute in times that decrease at least as D/log D with increasing 
D. 

As discussed in Sec. 4.5, the outputting of results called for 
in the SIMPLE code amounts to a "dump" of raw data. There is a minimum time 
for such a dump that grows with the size of the mesh and is independent of 
D. As discussed in Sec. 4.5 and illustrated at the end of Sec. 5.2.3, 
it appears essential to pre-process the data so as to extract significant 
features. If this is done, then output need not be a bottleneck. 

The VAL language is demonstrated as satisfactory for the expression 
of the SIMPLE problem as an abstract algorithm, provided that certain extensions 
are made in it. These extensions are listed in Sec. 5.1 and their use is 
shown in Secs. 5.2.3 and 5.2.4. The need for additional extensions to promote 


efficiency of execution is discussed below. 
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6.2. Implications of the spatio-temporal structure of the algorithm 
Following Holt (1979) we have analyzed the SIMPLE problem as given 


in an abstract algorithm expressed first in FORTRAN and then translated into 
VAL. The algorithm expressed in either language is called ‘abstract' when it 

is viewed as independent of physical arrangements in space and time for its 
execution. Our analysis of the SIMPLE algorithm in terms of role diagrams 
reveals spatial and temporal structure which will have to be found in any and 
all implementations. For example, by tracing through the algorithm for 

possible references to computational variables we discover the existence of 
algorithm-defined times when some number n of such variables must be co-maintained. 
This in turn implies that in any implementation of the algorithm there will 

have to be available, for some period, a space large enough to hold n values. 
(As the algorithm is to be executed by electronic circuits, this number n places 
a lower bound on the physical space which the algorithm can occupy.) To 

be more specific, ElJ,K], P£J,K], QfJ,K1, etc. meet in a zone and phase 

shown in Fig. 6 and in a relational sense define a time and location. 

As a second example, we discover in Fig. 6 that for any instruction 
of the main loop there are times-- i.e. phases -- when a given instruction 
may be executed and times when it certainly will not be. In other words one 
can determine prior to execution and independent of implementation that in 
any given phase a certain large majority of the instructions of the main loop 
will not be called. This property can be used both to guide compilation and 
also to guide the design of hardware for a dataflow computer: it suggests 
a programmable instruction cell that can make ready first one instruction 
and then another, much like a sequential processor. 

Finally the discussion of Sec. 3 and Figs. 3, 4 and 6 show that only 


a few of the myriad possible patterns of communication are actually needed 
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for a set of processing resources to execute the SIMPLE problem. In 
configuring a dataflow computer there are many possible alternatives for 
the arrangement of processing units, instruction cells, packet memory, 
and communications resources. Different arrangements offer different 
advantages for different problem classes, and place different demands on 
compilation. As discussed in Sec. 3, any hardware arrangement will reflect 
compromises which will detract from the execution of some classes of problems. 
Prior to large-scale investment, these relations between physical arrangement 
and problem class need to be examined in connection with various sample 


problems. 


6.3. The balance between programming ease and efficient use of hardware 


As a first step in exploring relations between hardware and 
problem class, VAL was employed to help express a problem in hydrodynamics 
in support of two anticipated tasks, relative to a dataflow computer that 


is not yet fully specified: 


.1. the design task of choosing a physical arrangement of hardware 


resources suitable to the SIMPLE problem; and 


.2. the compilation task of mapping the coded problem into machine 
instructions appropriate to a given physical arrangement of 


resources. 


Both tasks concern the mapping of a problem onto physical resources. The 
mapping is done in two steps: coding in a source language (VAL); followed 
by compilation which maps the source language into machine instructions. 


Historically a source language has been intended for the expression of a 


2 ba ce 


problem as an abstract algorithm -- ‘abstract' meaning that the algorithm 


was not tied to a particular physical arrangement of resources. But note: 


.3. To achieve efficient use of resources a programmer must allow 
for at least some features of implementation (e.g. “multiply” takes 


longer than “add"). 


.4. If the physical arrangement changes too much, a given source 


language become inappropriate. 


Indeed concurrently operability of resources contributed to the need to 
express concurrency in the problem, and hence to the need for VAL; i.e. VAL 
is superior to FORTRAN in expressing concurrency. A source language is 
shaped in part by assumptions concerning the physical arrangement of 
computational resources. FORTRAN was designed to facilitate a two-step 
mapping of a problem to machine instructions. In step one FORTRAN is used 
to map the problem essentially into instructions for a machine that is 
an idealized sequential computer -- idealized for instance in that it is 
imagined to have a random-access memory so big as not to be a constraining 
factor. In step two the FORTRAN code is compiled into machine code for 
an actual machine that departs in limited ways from the idealization -- 
e.g. by using a "small" random-access memory backed up by secondary storage. 
As FORTRAN corresponds to an idealized sequential computer, VAL 
presently corresponds to an idealized dataflow computer -- e.g. a dataflow 
computer imagined to have so many instruction cells that the number poses 


no constraint on how a problem might be executed. Note that: 


.5. A program like SIMPLE is a major task for programmers who can 


afford to learn the salient features of implementation; 


wae 


.6. The program is expected to run many hours per execution, and to 
be executed many times on a machine that costs enough to justify 


a large investment in efficient execution; 


./. The program is written to answer questions of physics that are 
progressively better answered as larger mesh sizes become executable 
in a day's run; the need for answers to these questions justifies 


a large investment in speed of execution. 


Whatever hardware design is chosen, the resources of a dataflow computer will 
be more complex than those of a sequential computer, and less susceptible to 
fully automated resource allocation. Within the dataflow context, the 
balance between ease of programming and efficiency weighs more toward the 
demand for efficiency. For problems of the SIMPLE type it appears unwise 

to force a separation between source-language programming and resource 
allocation. Some current languages -- e.g. PL/1 -- provide facilities for 
the control of resources; however these facilities are added ad hoc to 

a language that conceptually is inhospitable to the expression of physical 
arrangements in time and space. Because VAL encompasses the expression of 
concurrency, it offers at least a chance of extension to cover the control 
of resources in a more systematic way. The discussion of ENERGY HYDRO in 
Sec. 5.2.4 illustrates a related issue, the adaptation of the algorithm 


to a particular physical arrangement. 
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6.4. Extending VAL to support resource allocation 


We have seen that the SIMPLE problem has spatio-temporal structure 
that is germane to physical design, and for a given design, germane to the 
allocation of physical resources to execute the problem. Presently, a VAL 
program is thought of as having a "meaning" only to the extent that it 
defines a dataflow graph at the descriptive level of machine instructions. 
At this level of description the dataflow graph of SIMPLE is an enormous 
Tacework, with something on the order of a thousand computational events 
per zone, times thousands of zones. If a compiler works only from a dataflow 
graph at this level of detail, is it reasonsable to imagine that it could 
efficiently distribute all the instructions throughout the "space-time" of 
the computational resources? 

One might hope for some future "genious" to design such a compiler, 


but there is another approach: 


-1. Recognize that compilation will in fact use higher-level and/or 


auxiliary descriptions of the problem in allocating resources; and 


.2. Extend the programmer's task and his power of expression -- VAL -~- 
to express properties of the problem that can greatly reduce the 
burden of compilation -- properties such as those expressed in 


the role diagram of Fig. 6. 


In this approach the programmer would be supported in structuring the problem 
in a way that eases compilation for a given machine organization. This 
requires that the programmer be more explicit in guiding the "when" and 
"where" of program execution. It might be objected that such guidance depends 


too much on the details of a particular implementation, but this is not 
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necessarily so. There is a middle ground, where the programmer would 
formally express the information now conveyed by Fig. 6. The "where" 
implied by a "zone" of Fig. 6 is not directly a "machine location", but 
rather a relational location inherent in the SIMPLE algorithm. In that 
algorithm Ef{J,KJ, PLJ,K], Q{U,K], etc. meet many times, and in a relational 
sense meetings define "times and locations" -- e.g. Zone[K,L) of Fig. 6. 
In essence we see the programmer as calling the compiler's "attention" to 
grosser regions of a dataflow graph than appear at a machine-instruction 
level of description. The compiler would thus block out the assignment 
of gross regions to resources in a first phase, and then subsequently deal 
with further details. To pursue this course additional effort is needed 


to: 


.3. Bring under control the expression of the space-time aspect 
of an algorithm at different levels of detail, so as to guide 


the algorithm toward a particular machine organization; 


.4. Show what changes would be needed for VAL to express such aspects; 


.5. Evaluate the advantage of expressing SIMPLE and other examples in 


this way with respect to: 


a. what suggestions are offered for the organization of the 


resources of a dataflow computer; and 


b. how to distribute the burden of computing a problem over a 


given organization of resources. 
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Appendix A: Interpreting Role Diagrams 


SECTION DIRECTORY 


Section 
A.l. | Vertical string as path of a role player. 
A.2. Tokens 


A.3. : Circuits. 


Ad. Initialization and termination. 


A.5. ib Fragments 


> 
fo>) 


tt— Coincident activity of multiple role players. 


A.7. it Invariance of value. 


A.8. Oo Branching to alternative consumers. 
A.9. | Steering. 
A.10. a Encoding 
A.il, (xs Decoding 
A.l2. g—o Merging from alternative producers. 
A139, “+> -punating: 


A.14. os Unbundling. 


ATS. (ou) Compression of representation. 


ne bo : bt Copying. 


ee ee 


' 


A.17. a an old value. 


A.18. Operations (+, -, etc.) 


Rakes ee Buffered communication. 


a PS 


Appendix A: Interpreting Role Diagrams 


Throughout the report we have used role diagrams, invented by A. W. 
Holt (1979) to show the flow of values carried by physical actors. The 
notation presented here allows us to distinguish participations of actors 
in activities according to whether they are coincident, concurrent, alternative, 
or sequenced. 

The interpretation of role diagrams differs from that of dataflow 
graphs in that the former is based on this attitude: anything that is (even 
a value) must be someplace. Hence the flow of a value is a flow of effect 
over physical actors. A role diagram can be partitioned into strips; each 
strip is a locality in system space, and thus a place where some actor is 


resident. 


A.1: A vertical line is read downward as the advance of a role player 

(i.e. an actor) from one state to another through a sequence of activities. 

A state is drawn as a vertical line segment; an activity is drawn as a box. 
Here we show a role player "carrier of the value PRESSURE" proceeding through 


activity 1, followed by activity 2. 


ay ee 


A.2: The vertical line can be thought of as marked by a token. The position 
of the token shows the state of the role player. The token for pressure carries 


an inscription which states the value of the pressure. 


A.3: Circles at the top and bottom of a vertical line denote the same location 


of a circuit. In other words the figure 


denotes a cyclic progression through activity 1, activity 2, activity 3, back 


to activity 1, and so on. 


A.4: If a role P is initialized in activity 1 and terminated in activity 3 


i 


-we draw the following. 


Note that the initiation of a role (shown in activity 1) requires that a 


physical actor be on hand to play the role. 


ney ge 


A.5: In contrast to A.4, a fragment of a longer chain is drawn 


A.6: When several roles participate in a common activity their coincident 


participation is denoted by horizontal links. 


T 


STRESS 


As shown, P and Q must coincidently be present at the creation of STRESS. 


The horizontal line of boxes converts inputs (above) to outputs (below). 


A.7: The diagram A.6 indicates that P and Q change values as a consequence 
of taking part in the creation of STRESS. If we wish to indicate no change 


of value of P, we draw 


P Q 
| aa 
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A.8: A role can branch into alternative states, shown as 


P 


‘ 


A.9: In case of a branch, the choice of path can be resolved by interaction 
with other value-carrying roles. Suppose that exactly one of Bl or B2 will 
be present, and will resolve the choice for P; then A.8 could be filled out 


as 


A.10: In drawing a diagram with two alternative states, such as Bl and B2 in 


A.9, it may be convenient to pull the two lines into one: 


B 
| 


This pulling together is not an "objective" fact of the "system", but rather 
a matter decided by the drawer of the diagram. He decides to view the distinction 
formerly borne by the separation of the lines as “encoded” into an attribute of 


a token that travels on the joined line. 


ay (oa 
A.11: If the person who draws the diagram has encoded B1 and B2, as in A.10, 
then in drawing A.9 he would have to "decode" them -- i.e. to reporduce 


separated lines, one for each of the encoded alternatives. In this case A.9 


would be drawn with a fork: 


A.12: Two activities can be alternatives to the production of a single state, 


in which case two states of a role can merge. 


A.12 can be compared with A.9. Lines joined by branches and merges of a role 


form a state component of a Petri net. 


A.13: For convenience of presentation one may wish to bundle several roles 
together and picture them as a single "cable", as in an image of cabling 
together of different "wires". We illustrate this by roles A, B and C which 


are "cabled" into a compound strand called L . In other words, 


L = {A,B,C }. 


Unlike encoded alternatives (see A.10) all the roles of a bundle can be 


concurrently played 
A.14: Unbundling corresponding to the undoing of A.13 is drawn as follows. 


| 
L 
| 


A.15: Brackets around a row indicate that the row is compressed from a 


more detailed diagram shown elsewhere; for example the figure 


| | | 
DTNPH X 


is compressed from 


i 
DINPH =X 


DTN 
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A.15.1: The outputs of a bracketed row can be produced by an internal loop, 
containing internal variables. TNUP is such a variable in the following 
diagram, where 


| | | 
TMAX DT 


7 as en 


is compressed from 


o—fwas) Cr) (x) 


calculate X, V rh 


TNUP > TMAX ? 


By ae 


A.16: The following illustrates fanout. 


| 
P 


a 


(This notation was used in A.15.1.) 


A.16.1: Fanout can also be shown as follows. 


—F 


A.16.2: We link two boxes by a double bar to assert identity of output values; 
the following asserts that after the occurrence of the activity, B and B' 
carry copies of the same value; the figure does not assert anything about the 


relation between inputs, nor about the relation between inputs and outputs. 


A.17: The following illustrates the saving of the value of P as OLD P, while 


P is changed. 


(This notation was used in A.15) 


-79 - 


A.18: On occasion we indicate arithmetic operations on values, as in this 


picture. After the activity of the row occurs, C carries the value A+B. 


| 


A.18.1: If A is a matrix, then B as the sum over the elements of A could be 


pictured as follows. | 

A 
= 
B 
| 


A.19: Buffered communication. A fragment of Figure 2 (of the main report is 


(buffer) (buffer) 


This can be expanded to 


The figure .2 contains the fragment 


a a 


for which we introduce the abbreviated notation: 


ABs 


The slanted bar asserts that the lower activity consumes something produced 
in the upper activity, and that a buffer not explicitly shown mediates the 
transfer from the producing to the consuming activity. With this notation, 


Figure 4 of the main report is transformed into Fig. 5. 
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APPENDIX B 


Notes on Fitting the SIMPLE Code into Role Diagrams and VAL Modules 


Figure 6 of the main report somewhat schematically shows the 
connectivity of communication among processors, when one processor is assigned 
to each nodal and each zonal region of the dataflow graph. In this appendix 
we discuss the connections in more detail, and also discuss certain ways 
in which the algorithm of SIMPLE has been restated to clarify the connectivity. 
The objective is to help in considering hardware requirements, and to clarify 


aspects of the translation from FORTRAN into VAL. 


B.l. Interpretation of the cycle 


Fig. 6 shows phase 3 as producing new values for zone [K,Llas 
follows. 


zone 
CK,LJ 


.1: Schematic representation of production of zonal value. 


The fragment .1 is a schematic picture of an activity at zone K,L 
that draws on values from the four neighboring (i.e. corner) nodes to feed 
into the production of new values for the zone. With the indexing convention 
defined in Fig. 1 of the main report, one sees that the fragment .1 stands for 


the connections shown in .2: 


ee ae 


node 
[CK-1,L] node 
LK,LJ 


node 
[K,L-1] 


node 
{K-1,L-1] 


.2: Completed fragment showing all connections of nodes to a zone. 


The nodal values are a vector (with R and Z components) for velocity 
and a vector for position at each node. The corresponding type definitions 


and declarations in SIMPLE VAL are: 


type vector = record[R, Z: real]; 


type nodal = arrayfarray[vector]]; 


X, % position 
V: % velocity 
nodal %. 
The correspondence between these names as used in SIMPLE VAL and the names 


used in the FORTRAN code of SIMPLE is: 


FORTRAN code VAL code 
R X.R 
ra X.Z 
U V.R 


W V.Z 
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In order to clarify the connectivity, as well as to eliminate some 
unnecessary arithmetic, we introduce auxiliary variables, starting with 
a kind of tensor -- GX -- that describes the diagonal dimensions of each 
zone: 


GXLK,LI.W GXIK,LI.E 
(a vector) (a vector) 


oe 


.3: Definition of GX. 


GX is, at least in spirit, a tensor; GX[K,LJ.W is the vector difference 
between the vector position at the nortwest corner and the vector position 
at the southeast corner. GX is produced for interior zones by ZONE_GEOM 
in phase 3, and for boundary zones by BOUNDARY PROJECT; in the first case 


the defining relation is 


4. type zone tensor = array array record E, W: vector ; 


GX: zone_tensor : 
forall] K in CLIM.KN+1, LIM.KXJ, L in CLIM.LN+1, LIM.LXJ construct 
recordLE: X{£K,L]- X{K-1,L-1]; W: X(K-1,L)- X{K,L-1]] endall; %. 


Note that X is a vector, so that .4 is a shorthand expression; strictly speaking 
one must define a subtraction function with vector arguments. This is 

easy to do, but clutters the presentation. With the understanding that 

we are abbreviating, we shall apply "-", "+" and multiplication by a scalar ("*") 


to vectors. The node~to-zone communications needed to form GX are shown in 
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the picture .2. The auxiliary variable GV is a zonal tensor formed from V 
in exactly the same way that GX is formed from X. 

Now we address phase 2 and the calculation of V. Prior to 
communicating from the zones around a given node to the node, a tensor 
STRESS is calculated for each zone; this calculation for a given zone 
draws only on array elements for that zone. The computation covers boundary 


zones, set up in phase 1, as well as interior zones. 


<9 STRESS: zone tensor := 
forall K in LLIM.KN, LIM.KX+1], L infLIM.LN, LIM.LX+1] construct 
recordLE: (PLK,LJ+ Q{K,LJ)*GX{K,LI].E; %scalar * vector 
W: (PCK,LJ+ QLK,LJ)*GXLK,LJ.W1] endall ; %, 


where P and Q are pressure and artificial viscosity, respectively, just as 


in the FORTRAN code. In phase 1 the auxilliary variable RHOJ is produced as: 


.6 _ RHOJ : zonal := 
forall] K infLIM.KN, LIM.KX+1], L in LLIM.LN, LIM.LX+1] construct 
RHOLK,LI*AJLK,L4J endall; %, 


where RHO and AJ are density and area jacobian, just as in the FORTRAN code. 
Phase 2 of the cycle produces new values for each node, namely 
V and X. The fragment that produces values for a particular node, say 


node K,L appears in phase 2 of Fig. 6 as follows. 


node 
[K,LJ 


.7: Schematic representation of the production of a nodal value. 
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The fragment .7 is a schematic picture of an activity that draws on values 
from the four zones around node [K,L1] to feed into the production of new 


values of X and V for the node. Thus the fragment .7 stands for 


zone zone 
[K,L+1] {K+1,L+1] 


zone 
{K+1,L] 


zone 


.8: Completed fragment showing all zones connected to a node 


Each "cable" of values from a zone to node [K,L1] must carry STRESS and 

RHOJ from the zone, and at least one of these cables must bring the time 
steps DTNPH and DTN as well. (DTNPH and DTN are used here as they are in 
the FORTRAN code of SIMPLE (1979).) The activity of the node in .8 during 
phase 2 of the cycle is tocalculate an acceleration (ACC), to use this 
acceleration to update velocity (V), and then to use the velocity to update 
position (X). In updating velocity a time step DTN is used. Position times 
interleave the: times at which velocity is calculated, so that a different 
time step (DTNPH) is used to update position. Continuing to use the 
abbreviation of scalar operation signs for operations on vector values, 


this activity can be expressed in VAL as: 


28h x 
V, X := 
forall K in { LIM.KN, LIM.KX+1], L in CLIM.LN, LIM.LX+1] construct 
let Y: vector := (2./(RHOJ[K,L ]+ RHOJLK,L+1] + RHOJ[K+1,L] + RHOJ{K+1,L+11)) 
*(STRESSCK,L+1].£ + STRESS(K,L].W - (STRESS{K+1,L+1].W + STRESSLK+1,LJ.E)); 
ACC: vector := record[R: -Y.Z; Z: Y.R]; 
V1: vector := DTN*ACC + VCK,LI 
in V1, DTNPH*V1 + X[K,LJ endlet 
endal| 


After expansion of the vector operations, this code would provide the 
functions VELOCITY and POSITION of Sec. 5.2.3. 

Phase 4 involves arrays that are partly nodal and partly zonal in 
character. An element of CBB is obtained as an intermediate between two 
nodes of the same L-coordinate but adjoining K coordinates, and two zones 
bounded by the nodal K coordinates and on either side of the L coordinate: 

In calculating heat conduction subroutine CONDUCT of the SIMPLE 
FORTRAN code generates arrays CBB and DBB, per lines 1583 through 1608. 


CBB and DBB draw on values from both nodes and zones, as shown: 


aa 


| zone ; node 


| c= + war | 
node — _ (node zone | zone | 
CK-1,Ld iCK,LI Htktd CK+1,LI | 
! zone | bez — Tede = 
| [KL] [K,L-1] 
For CBBIK,L3 For DBBLK,LJ 


é 
-9: Nodes and zones that supply values to the calculation of CBBLK,L! 
and DBB{K,LJ. 
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To adhere strictly to the connectivity shown in Fig. 6, one programs the 
calculation of CBB and DBB in two parts, one as an augmentation of ZONE_GEOM 
and the other as part of an augmented ENERGY HYDRO. The augmentation 
consists of generating geometrical quantities as part of ZONE _GEOM, 
referring these to zones, as was done for GX, and then using these quantities 
to simplify the connectivity needed in ENERGY HYDRO. An alternative which 
is displayed in SIMPLE VAL of Sec. 5.2.3 is to accept a slightly more 
comp] ex eonnachivity and thereby avoid the introduction of more auxilliary 
variables. 

CBB and DBB are partly zonal and partly nodal in character, 
so that fitting them to either class of processors is arbitrary. Because 
the nodal processors are less heavily tied: we have assumed that they 
would be used to compute CBB and DBB from zonal quantities (CC in FORTRAN). 
The consequent connectivity is shown in phase 4 of Fig. 6. For the 


Z-sweep this connectivity is shown in more detail in .10: 
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Node Zone Node Zone 
(K,L-1] [K,L] as CK,L+1) 
. 
CBB | CBB | 


{ny 


CBB*(1-A), 
CBB*B CBB 


{A,B} 

CBB*(1~-A), wer 

CBB*B t CBB 

{A,B} 
ae 
TEMP2{K,L+2] 
TEMP2LK,L+1] 

TEMP2EK,LJ TEMP2LK,LJ TEMP2[ K,L+1] 


| 


| 


-10: Detail of Z-sweep of ENERGY HEAT. 
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Appendix C: The SIMPLE code in FORTRAN 


Edition of February 12, 1979 as provided by John Woodruff 


“OUDNOUKRON-OODNOUARON= 


NYAS ae 


SPUTT &ME,,,100000 100000, , , 2000 


lolerelelere) 


PROGRAM H2DDCHFILE, TAPES=HFILE) 


COMMON /KLS/ K,L, DEBUG, VERSION, WHER,WHEN, P106, PIE, IGEN, P1D2 


x ,OTC,KC,LC, DTEN, KEN, LEN, SKE, HN, SIEL, CNN, ENC, ENH, ENCG, WN 


x 


x E(33, 33), P(33, 33), AJ(33,33),$(33, 33) | NBC(33, 33) 
x ,W(33, 33), TEMP(33, 33) 
X , AC33)33),B(33) 33) ,CC( 33, 33), DUM(33, 33), CBB(33, 33) 
X |} DBB(33, 33), CAP(33, 33), SIG(34, 33) , TS(33; 33) 
COMMON /PARAM/ NYCL, TNUP, OTNUP, DTN, DTNPH, DTNMH, EDTIME, EDDT 
x ,GAM, GAMZ, COF,C1F,61, TMAX, DTMAX, DTMIN, TFLR, NOHYD 
x ,C2,P2,P3.N6,NTTY, NED 
COMMON /KLSPACE/ KMN, LMN, KMX, LMX, KMMZ, LMAZ, KMNP, LMNP, KMXP, LMXP 
COMMON /GENCOM/ RH6O, EO, UO, PO, WO, DR,0Z, NBCU, NBCD, NBCL, NBCR 
xX ,PB(3) ,PBB(3),QB(3) 
COMMON /MINMAX/ XMIN, XMAX, YMIN, YMAX,PMIN, PMAX, QMIN, QMAX 
xX, RMIN, RMAX, KQ,LQ,KR,LR,KP.LP 
»XMINX, XMAXX) YMINX) YMAXX 
COMMON /TIMING/ NBT(20)-,NCT(20), NETC20),NPT(20),NXT(20) 
COMMON /EOSCOM/ KEOS, TARG1, TARG2, TARGS, RARG1, RARG2, RARGS, 
X FUNC1,FUNC2, FUNCS, TEMPS, EPS, EPSO 
COMMGN /COM2/ NTSV(2), NRSV(2) , MSV(2}, TESC7) , RES(9) 
X ,AES(12), BES(12), CES(12) , DES(12), EES( 12), FES(12) ,GES(12) 
X }HES(12)_PES(12) I TES(3).IRES(3). 1 ZES(3> 
NCYL = CYCLE COUNTER EDTIME= TIME @T EDIT 
TNUP = PROBLEM TIME(N+1) EDDT = DELTAT NEXT EDIT 
DTN = DELTAT (N) TMAX = MAXIMUM TIME 
DTNPH= DELTAT (N+1/2) DTMAX = MAXIMUM ALLOWED DT 
DTNMH= DELTAT (N-1/2) DTMIN = MINIMUM ALLOWED DT 


» NCP, P1D8, VCUT 
COMMON /PROGG/ RO, Z0,R1,21,RP,2P,RR,2Z 
COMMON /COMN/ R(33,33) ,2(33, 33) ,U(33 


DIMENSION ARRAY (1) 
EQUIVALENCE (ARRAY, R) 


11/0/ 
DATA NLENKS/5/ 
DATA VERSION /1./ 
DATA NCP/10/ 
DATA IER/0O/ 
DATA Ue a 


D/O/ 
DATA PIE/3.1415926535898/ 
DATA EDTIME/O. / 
DATA EDDT/4./ 
DATA P1D2/,.5/ 


33), RHO(33, 33) ,Q(33, 33) 


PAGE 


He Hs HH RM = OQOOCOCOOO 


yet Sa oh ee a es es 
COODNAUAGN—-OOONHDUAW 


PAGE 
DATA TMAX/12.001/ 
DATA VCUT/1.E-10/ 
DATA OTEN/1.E+10/ 
DATA DTC/1.E+10/ 
CALL CHANGE (2H+H) 
CALL ASSIGN(3, 2RPH) 
CALL CLOCK (WHER, WHEN) 
ZERS SUT ALL ARRAYS 
L=21*33*33 
08 10 K=1,l 
ARRAY (K)=0. 
10 CONTINUE 
SET UP EOS TABLES 
CALL SETUP 


SET PARAMETERS FOR TEST PROBLEM 


Q 
b 
x4 
AAADN-QNM-OO 1! 
X 


v 
N 
HONwte eevee: - 


VOOoO VIVA 

eslockockutzealeeh pera) 
OWnaananniu 
wunnoo-oo- 


Vv 
@ 
Oo 
Seer nnn 


DTN=DTNPH 
DTNMH=DTNPH 


C2=1.5 
COF=C2*.25 
c1i=.5 


DNA INNNN NNO OODOAMDOOOOOUMGACOAGAUUGRABAASARAARADDOQOQHDGQOQOQONNNNNNNNN 
COONOHUILON—OCOSVNODKRON-OOBVOUAON-OOONTOUAON—-OODNOHUAON-OOONOMAON— 


PAGE 


CiF=.S«Cl 
GAMZ=GAM-1. 


GET INPUT PARAMETERS 


WRITECNTTY, 4) , 
4 FORMAT(23H ENTER INPUT PARAMETERS) 


READ(NTTY, 5)KMN, KMX, LMN,LMX, EDOT, EDtIME, TMAX 
5S FORMAT( 412, 3F5. 2) 


KMNP=KMN+1 
LMNP=LMN+1 


KMXP=KMX +1 
LMXP=LMX +1 


KMXZ=KMX- 1 
LMXZ=LMX- 1 


GENERATE PROBLEM 
CALL GEN 
I1GEN=0 

INITIALIZE TIMER 


O00 


oO 


o0a0gq 9 020090 


NED=1 

12=NECOND(11) 
START CYCLE HERE 
1 CONTINUE 


DTC2=1.E+12 
SKE=0, - 


ENH=0. 
DTEN=1.E+12 


c 
OPES SELES SSE SSS SSE SS SSS ESSE STS SESS SS ESSE SES SS 
* 


Cx 
ae GEOMETRY CALCULATIGN FOR BOUNDARY ZONES~ *x 
x * 


CR KIRK KKK OK OR KR OK OK KK 


SET UP BOTTOM SIDE BOUNDARY ZONES 


9 Onan 


) F 1CK+1,Lb) 


om el elolelele) 
aouv 
RRA 


RO=R(KMN,L) 


ee ee ee ae eee ee 
DOOGOOOGOOOHOHODDDOHHOHOH 
ONNOULRON-CODNOUAQN— 


200 


8 
209 


NANANNNNNNNNNN 
NODA A) a at a et 
ON—-OCOONOUAQN—O 


oO 


lelelolelelelomm oe) 


oO 


lololetolelele) QO 


oO 


R(K, 
O(K,L) 
P(K, 


ZO=Z(KMN,L) 


DO 200 K=KMN,KMXZ 


CALL PROJCT 


R(K,L-1)=RR 
Z(K,L-1)=22 


RO=R1 
20=Z1 


200 CONTINUE 
SET UP BOTTOM RIGHT CORNER 


CALL PROJCT 


R(K,L-1)=RR 
2(K,L-1)=22Z 


L+1) 


L=LMX 
RO=R(KMN,L) 
Z0=Z(KMN,L) 


DG 204 K=KMN, KMXZ 


+ 
_ 


NA N 
RR KK 
cr 
ts 


P(K,L+1) 
1(K-1,L) O¢K,L) 

. R(K,L-1) 
K=KMX 
L=LMN 
RO=R(K,L} 
ZO=2(K,L) 
RI=R(K-1,L) 
21=Z(K-1,L) 
RP=R(K,L+1) 
Z2P=Z(K,L4+1) 


SET UP TOP SIDE BOUNDARY ZONES 


1(K+1,L) 


loleleloleleoleomme) 


i?) 


oO 


QoOno000 


CALL PROJCT 


RCK,L+1)=RR 
Z2(K,L+1)=22 


RO=R1 
Z0=21 


204 CONTINUE 


SET UP TOP RIGHT CORNER 


NZ NA NA TA 


CALL PROJCT 


R(K,L+1)=RR 
Z(K,L+1)22Z 


SET UP LEFT SIDE BOUNDARY ZONES 


1¢K,L4+1) 


R(K-1,L) 
K=KMN 


RO=R(K, LMN) 


Z20=Z(K,LMN) 
DS 207 L=LMN, LMXZ 


R1=R(K,L +1) 
21=Z(K,L+1) 


RP=R(K+1,L) 
Z2P=Z(K+1,L) 


CALL PROJCT 


1,L)=RP 
Z2(K-1,L)=22 
1 


RCK~ 


RO=R 
Z0=21 


207 CONTINUE 


OCK,L) 


P(K+1,L) 


PAGE 


BDWWWWAWWWOWW 
Neo 
OCWONOATAGDN-O 


321 


QONMNOaN 


9 


OR00000 


oO 


OQOOHGND Oo io) 


Oo 


1K, 
PCK-1,L) OCK,L 


SET UP TOP LEFT CORNER 
R(K-1,L) O¢K,L) P(K+1,L) 
1K,0-1) 
K=KMN 
L=LMX 
RO=R(K, UL) 
ZO=Z(K,L] 
R1=R(K,L-1) 
ZieZ(K,bL-1) 
RP=R(K+1,L) 
ZP=Z(K+1,L) 
CALL PROJCT 
R(K-7,L)=RR 
Z(K-1,L)=22 


SET UP RIGHT SIDE BOUNDARY ZONES 


L+1) 
) 


K=KMX 
RO=R(K, LMN) 
ZO=2(K,LMN) 


DO 210 L=LMN,LMxXZ 


RI=RCK,L+1) 
Z15Z2(K,L+1) 
RP=R(K-1,L) 
ZP=Z(K-1,L) 
CALL PROJCT 
R(K+1,L)=RR 
Z(K+1,L)=2Z 
RO=R1 

ZO=Z1 


210 CONTINUE 
SET UP TOP RIGHT CORNER 


P(K-1,L) Of€K,L) R(K+1,L) 
1¢K,L-1) 
K=KMX 
L=LMX 


R(K+1,L) 


PAGE 


Q. 0 


Q00000 


fe) 


fe) 


oO 0 


lelelelelele) 


Pnceee 


Q 


Sea 


88 


ADBASABSBAABAADA 
Dn at ak a et et et et 
OOONGAGRON=6 


PAGE 


R(K+1,L) =RR 
2(K+1,L)=Z2 


SET UP TOP RIGHT CORNER 
P(K-1,L41) 1CK,L+1) R(K+1,L41) 
O(K.L) 


a 


CALL PROJCT 


R(K+1,0+1)2RR 
Z(K4+1, 0417222 


SET UP BOTTOM LEFT CORNER 


OCK,L) 
RCK-1,L-1) 0 1CK,L-1) PC(K+1,L-13 


CAL ROICT 


R(K-1,L-1)=RR 
Z(K~-1,L-1)=22 


SET UP BOTTOM RIGHT CORNER 


P(Kt?,L+1) 
OCK,L) 1CK+1,L) 


lols) 


ie] 


fololelolelele) 


io] 


R(K+1,L-1) 


NA NA ND TA 


TU -~— OO 
wu 


CALL PROJCT 


‘RCK+1,L-1)=RR 


Z(K4+1,L-1)=2Z 


SET UP TOP LEFT CORNER 


R(K-1,L+1 
1(K-1,L) O(K,L) 
P(K-1,L~-1) 
L=LMX 
K=KMN 
RO=R(K,L) 
20=2(K,L) 
R1=R(K-1,L) 
Z1=Z2(K-1,L) 
RP=R(K-1,L-1) 
Z2P=Z(K-1,L~-1) 


CALL PROJCT 


R(K~-1,L4+1)=2RR 
Z2(K-1,L41)=2Z2 


c 
CaK KKK KK KKK KR EK KR KKK KKK RK KKK KK KKK KK KK 


Cx 
cx 
Cx 


« 
SET UP BOUNDARY ZONE ATTRIBUTES * 
« 


CORO OOOO IO IO OIE 


oO | HAMNO 


SET UP BOTTOM SIDE BOUNDARY ZONES 
(K,L) = (K,L+1) 


L=LMN 
DO 255 K=KMNP,KMX 


RHO(K,L) =RHO(K,L+1) 
AJ(K,L)=AJ(K,L41) 
IP=NBC(K-1,L) 
Q(K,L)=QB(1P)*Q(K,L+1) 


PAGE 


COOVNOUAON—-O 


AAAGAAAGHONAG 
oOo 9 |9fN000 0 


Nw eae aes 


a Qa OH0NH00 Oo 


oO 9 90000 4 


ona0 90 


P(K,L)=PBB( IP) +PBCIP)*P(K,L+1) 


255 CONTINUE 
SET UP RIGHT SIDE BOUNDARY ZONES 
(K+1,L) = (K,b) 


K=KMX 
DS 265 L=LMNP, LMX 
RHO(K+1,L) =RHOCK,L) 
AJ(K+i, LIFAJCK SL 
Me Lo tHe 


) 
Q(K,L 
P(K+1,L)=PB +PBC(I 


on 
WW 
* 
3 
® 
es 


265 CONTINUE 
SET UP TOP SIDE BOUNDARY ZONES 
(K,L+1) = (K,L) 


L=LMX 
DO 275 K=KMNP, KMX 


RHO(K,L+1)= trae L) 
AJ(K L+ii= AJ He K, 


i} 
Q(K, L#1) =QB (IP) xQ(K, L) 
P(K,L+1)=PBBCIP) #PBCIP) *P(K,L) 


275 CONTINUE 
SET UP LEFT SIDE BOUNDARY ZONES 
(K,L) = (K+T,L) 


K=KMN 
DS 285 L=LMNP,LMX 


RHO(K,L)=RHO(K+1,L) 
AJ(K, LI=ASCKF1, LS 
IP=NBC(K,L-1) 
Q(K,L)=QB(1P)*Q 
P(K,L)=PBBC(IP)+ 


285 CONTINUE 
SET UP BOTTOM LEFT CORNER 


PCKMN, LMN) =P (KMNP, LMNP ) 
QC(KMN, LMN) =Q(KMNP, LMNP ) 
RHOCKMN, LMN) =RHO CKMNP , LMNP ) 
AJ (CKMN, LMN) =AJ (KMNP, LMNP) 


SET UP BOTTOM RIGHT CORNER 


PAGE 


PAGE 10 


PC KMXP, LMN) =P (KMXP, LMN+1 ) 
Q(KMXP, LMN) =Q(KMXP, LMN+1 ) 

RHO (KMXP, LMN) =RHO(CKMXP, LMN+1) 
AJ (KMXP, LMN) =AJ (KMXP, LMN+T1 ) 


SET UP TOP RIGHT CORNER 
PCKMXP, LMXP) =P (KMXP, LMX) 
Q(KMXP; LMXP) =Q(KMXP; LMX) 

RHO (KMXP, LMXP ) =RHO(KMXP, LMX) 
AJCKMXP, CMXP) =AJ CKMXP, LAX) 
SET UP TOP LEFT CORNER 
P(KMN, LMXP) =P (KMNP, LMXP) 
Q(KMN, LMXP) =Q(KMNP, LMXP) 
RHO (KMN, LMXP) =RHO(KMNP, LMXP) 
AJCKMN, LMXP) =AJ (KMNP, LEIXP) 
GET BOUNDARY CONDITION COMPUTE TIME 


12=NECOND(11) 
NBT (NED) =NBT (NED) +12 


Cc 
e DEBUG EDIT 


IF (DEBUG.EQ.0.) GO TS 442 
1GEN=1 


WRITE(NO, 441) 
441 FORMAT(SH DEBUG 1) 


“ CALL EDIT 
‘ 442 CONTINUE 
DO 450 L=LMN, LMX 


DO 445 K=KMN,KMX 
COMPUTE ACCELERATION 


200 Qa 


folele) 


AU=(P(K,L)+Q(K,L)) * (2Z2(K,L-1)-Z(K-1,L)) + 
X  (PCKET, L) 4Q(K41,L) )*(Z(K41,L)-Z0K,L-1)) + 
X (PUK #1041) 4Q(K41, L471) x (ZCK, LAT I-Z(K41,L)) + 
X CPCK, L471) 4Q(K,L 41) x (Z(K-1,03-20K, L417) 
AW=(P(K;L)+Q(K,L)) * CRCK,L-1$-R(K-1,L)) + 
XX  (PCKFT,L)4QCK41,L)) « ORCKS1,L)-RCK,L-1)) + 
X  (PCK415041) 4QCK41,L41)) *& (ROK, L41)-RCK41,L)) + 
X  CPCK,L41)4Q(K, L413) & (ROK-1,L3-RCK,L41)) 
AUW=RHO(K, L) xAJ(K,L) +RHO(K4+1, LL RAJ(K4+1,L)+RHO(K,L +1) AS (CK, L+1) 
X  #RHOCK41, L +1) «AS (K41, L417) 
AUW=2. /AUW 
AU= -AUXxAUW 
AW=AW*AUW 
4 U(K,L) =UCK, L) #DTNXAU 
C ADVANCE VELOCITIES TO N+1/2 FROM N-1/2 
W(K,L)=WtK, L) +DTNxAW 
C POSITION (N+1) 
IFC ABS(UCK,L)). LE. VCUT)UCK, L) 20. 
IF CABS(WC(K.L)) .LE. VCUT)W(K,L) 30. 
ACK, L) =UCK,L) x*24W(K,L) ax2 


0 
611 


AQ eS a Se et es 
~“OOONOGUROND 


OG) 5) G) 0) 0) 0) 6 
oO 


445 CONTINUE 
450 CONTINUE 
[FCNOHYD.EQ.1) GO TS 455 
c OnE! TO SKIP HYDRO 


U(K,L) 
Z(K.L)=Z¢K.,L) +DTNPH*W(K,L) 
451 CONTINUE 
452 CONTINUE 
ACCELERATION VELOCITY AND 
CO-ORDINATES DONE 


BEGIN LOOP 3 
TEMPRY xxXXKKSKKX 


DEBUG EDIT 
IF (DEBUG.EQ.0.) GO TO 455 
IGEN=1 


leololelelelelele) 


WRITE(NG, 456) 
456 FORMAT(SH DEBUG 2) 


CALL EDIT 
‘455 CONTINUE 
CALL HWORK 


COMPUTE HYDRO WORK ON THE BOUNDARY 
DO 490 L=LMNP, 


20 0 0 90 


2 
q 
> 
Qo 
a 


S=VOLUME/2X (CM**x3/RA 
VN=1./RHO(K,L) 
VN=SPECIFIC VOLUME AT (N) 

VNP=SPECIFIC VOLUME AT (N+1) 

RHO(K,L) =RHO(K, L) *SN/S(K, L) 

DUM(K,L12RHO(K,L)*xS(K,L) 
DUM=MASS 

DENSITY AT N+ 

VNP=1. /RHO(K,L) 

DELV=VNP-VN 
COMPUTE ARTIFICIAL VISCOSI 

DRK=R(K,L)-R(K 
DRL=ER(K,L)-R(K 


O00 00 


foto] 


END OF NECOND PASS 


PAGE 


41 


708 
709 


NSNNANNINN INN 
Yas a a es ee 
OWbONOURGN—-O 


DZK=2(K,L)-Z(K-1,L-1)+Z(K,L-1)-Z(K-1,L) 
D2L=27(K.L)J-Z2(K-1.L-1)+2(K-1,L)-Z20K,L-1} 
DUK=U(K,L)-U(K-1,L-1)+#U(K,L-1) -U(K-1,L} 
DUL=U(K~L)-U(K-1.L-1)+UCK-1,L) -U(K,L-1} 
DWK=W(K.L)-W(K~1,0-1)+W(K,L-1) -WEK-1,L) 
DWL=W(K.L)-W(K-1,L-1)+W(K-1,L) “WOK, L-1)} 
c 
c DRK=2DR/DK 
C DRL=2DR/DL 
W1= DRK*DWL-DZK*DUL 
W2= OUK*DZL-DWK*DRL 
Q(K,L)=0. 
W3=0, 
W4=0. 
IF(W1.LT.O. )W3=W1*2/ (DRK*xX2+DZK**2} 
IF(W2.LT.0. )W4=W2x*2/ (DRL¥xX2+DZL«*2 } 
IFC (W34W4) .EQ.0.) G6 TO 465 
CA=SQRT (GAM*P(K;,L) /RHO(K, L)) 
DON’T COMPUTE Q IF ZONE IS NOT BEING COMPRESSED 
Q(K,L)=COFxXRHO(K,L)*(W34W4) + CIFXCAXRHOCK,L)*SQRT(W3+W4) 
C CIF= Cix,5 COF= .25*COx*2 CA=SOUND SPEED 
IF(CA.EQ.0.) GO TO 465 
TSO=(AJ(CK,L)**2)/ (CAXCAX (DRKX*2+DRL**24+D2K *X24+DZL**2) ) 
IF(DTC2.LE.TSO) GO TO 462 
C HAVE A NEW MINIMUM DELTA T 
DTC2=TSO 
KC=K 
LC=L 
462 CONTINUE 
465 CONTINUE 
EPS=E(K,L)-(P(K,L)+Q(K,L)) *DELV 
C E(N+1) 
RARGT=RHOCK,L) 
CALL TEMPCAL 
TARG1=TEMPS 
CALL IES1 
PNP=FUNC1 


C GAMMA~-LAW EGOS calle GAM- 


E(K,L)=E¢(K,L)- “Bx (PP SP (K L))+Q(K,L))*DELV 


ECK, CsAMAKT (E(K L),1.E-30) 
EPS=E(K,L) 
CALL TEMPCAL —_ 
C GET TEMPERATURE AS FUNCTION OF E.RHO 
TARG 1 =AMAX1( TEMPS, TFLR) 
TEMP(K,L)=TARG1 
CALL _IES1 
GET PRESSURE 
P(K,L)=FUNC1 
ECN+1) 
PCN+1) 


Oo 900 © 


KINETIC ENERGY FOR THE ZONE 
485 CONTINUE 
490 CONTINUE 


SRR CREA RETR R ERE eRe e Tere es STEN OF LOOP 3 


SKE=SKE+P1D8*DUM(K,L) x (ACK, LI +ACK-1,L)4ACK,L-1)4A(K-1,bL-1)) 


PAGE 
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721 C DEBUG EDIT 
722 C 

723 IF (DEBUG.EG.0.) GO TO 495 

724 IGEN=1 

725 C 

726 WRITECNG, 493) 

727 493 FORMAT(SH DEBUG 3) 

728 C 

729 CALL EDIT 

730 

731 495 CONTINUE 

732 

733 1GEN=0 

734 12=NECOND( 11) 

735 NXT CNED) =NXT (NED) +12 

736 C 

737 CALL CONDUCT 

738 C DO HEAT CONDUCTION 

739 C 

740 NYCL=NYCL+1 

741 © ADVANCE CYCLE COUNTER 

742 DTNMH=DTNPH 

743 DTC=SQRT(DTC2) 

744 DTNPH=DTC 

745 DTNPH=AMIN1 (DTNPH, DTEN, DTMAX) 

746 C LIMIT MAGNITUDE GF DT 

747 DTN=. 5* (DTNPH+DTNMH) 

748 TNUP=TNUP+DTNPH 

743 IF (DTNPH.GE.DTMIN) GO Te 602 

750 Cxxxexaaxex DT TS BELOW ALLOWED MINIMUM x#4x «xxx 
751 WRITE(NO,601)NYCL, TNUP, DTNPH, DTMIN 

752 WRITE(NS. 601) NYCL. TNUP, DTNPH, DTMIN 

753 601 FORMAT(12H DTSTOP NYCL/16,3H T ,€12.4,4H DT ,£12.4) 
754 GO T6 3999 

755 602 CONTINUE 

756 TE=SKE+ENH 

757 CN=TE-HN-WN 

758 IF(NYCL.EQ.1) CNOLD=CN 

759 CNN=CN-CNOLD 

760 ENCG=ENCG+CNN 

761 CNOLD=CN 

762 IF (MOD(NYCL,NCP).NE.O) GO TO 603 

763 WRI TE(NG, 706) 

764 706 FORMAT (64 CYCLE, 4X, SHTIME ,7X,2HDT, 10X, 3HDTC,5X,8H KC 
765 3HDTE, 5X, 8H KEN LEN} 

766 (NG STUNTED NUP DTNPH, DT¢,KC,LC, DTEN, KEN, LEN 
767 707 FORMAT CLS. 3E12.4,314,612.4, 214) 

768 22=ABS(ENC- CENFI#TIN) DENG 

769 WRITECNG, 708) 

770 708 FORMAT( 4X, 4HETOT,8X,4HIE ,8X,4HKE ,8X%,4HHN  , 8X, 4HWN 
771 x 8X) 5HECONS, 7%, SHCN(N) , 7%, 4HECNG } 
772 WRITE(NG, 709) TE, ENC, SKE, HN, WN, 22, CNN, ENCG 
773 709. FORMAT(8E12. 4) 

774 603 CONTINUE 

775 12=NECOND(11) 

776 NCT (NED) =NCT (NED) +12 

777 C RUN TIME FOR PHYSICS 

778 IFCTNUP.LT.EDTIME) GO TO 605 

779 C TIME TO EDIT 

780 CALL EDIT 


Lc, 


4 


PAGE 
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s09 


SVHDODHDWDHOONMDDOO 
NANAN Ae SS eS ee 
ROD -OWONOUAON—O 


604 


WRITECNO, 604)NYCL, TNUP, DTNPH, ead KR, LR 


FORMAT(12H EDIT NYCL= 


c MES aoe TO 


TTY 
IME=EDTIME+EDDT 


, 16, 2Ei2. 


C ADVANCE EDITME TO NEXT VALUE 


605 


CONTINUE 
IF CTNUP.LT.TMAX) GO_TO 


,E14: 


5S, 


214) 


610 
Cex Kck eset PROBLEM HAS REACHED TMAX 1 8 1308 300% 


agQ 


ao 


607 
610 
999 


WRITECNG, 607)NYCL, TNUP, 
WRITECNO,607)NYCL, TNUP, 
FORMAT (12H STOP TMAX 
CONTINUE 

GO To } 

CONTINUE 


PROBLEM COMPLETED GET OFF 


616 


618 


619 


620 


CALL PLOTE 


WRITECNG, 61 | CNBT(K),NCTCK), NET{K), NPTCK), NXT(K),K=1, NED) 


FORMAT (5(1X,110)) 


11=0 
0S 618 K=1,NED 


Bere 
+16, 2612.4) 


/ 


1 
SE ENOE CK) +NCTCK) +NETCK) #NPT(K) tNXT CK) 


I 
CONTIN 


T 
r 
T 
NPT 
tT 
CON 
AES 
$ 
Ss 
Ss 


~Qranaen MZZZZZRO 


te ed ell ed 


RETURN 
END 
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SUBROUTINE GEN 
THIS SUBROUTINE GENERATES THE INITIAL PROBLEM To BE RUN 
COMMON /KLS/ K,L, DEBUG, VERSION, WHER, WHEN, P1D6, PIE, IGEN, P1D2 
x ,DTC,KC,LC, DTEN, KEN, LEN, SKE, HN, STEL, CNN, ENC, ENH, ENCG, WN 
COMMON /PROGG/ RO, Z0,R1,21,RP,2ZP,RR,ZZ 
COMMON /COMN/ R(33,33),2(33, 34) ,U(33, 33), RHO(33,33),Q(33,33) 
x E(33,33),P(33, 33), AJL33, 33), $(33, 33) | NBC(33, 33) 
x ,W(38, 33), TEMP(33, 33) 
X , A(33°33),8(33; 33) ,CC(33, 33) , DUM(33, 33), CBB(33, 33) 
x * 0BB( 43, 33), CAP(33, 33) , SIG(34, 33) , T$(33, 33) 
COMMON: /PARAM/ NYCL, TNUP, DTNUP, DTN, DTNPH, DTNMH, EDTIME, EDDT 
, GAM, GAMZ, COF.C1F,¢1, TMAX, DIMAX, DTMIN, TFLR, NOHYD 
*c2,P2,P35N6,NTTY, NED 
COMMON /KLSPACE/ KMN,LMN,KMX, LMX, KMXZ, LMAZ, KMNP, LMNP, KMXP, LMXP 


COMMON /GENCOM/ RHOO,EO,U0,PO,WO,DR,DZ,NBCU,NBCD,NBCL,NBCR 
xX ,PB(3), PBB(S),GB(3) 


COMMON /MINMAX/ XMIN,XMAX, YMIN, YMAX, PMIN, PMAX, QMIN, QMAX 
X, RMIN, RMAX,KQ,LQ,KR,CR,KP, 
x" OXMINX, XMAXX) YMENXS YMAXX 
IGEN N@T EQUAL O WILL CAUSE THE EDIT ROUTINE TO PRINT ALL THE VARIABL 


DATA IGEN/1/ 


xx 


CK KKK KK KOK RK KKK KKK KKK 


Cx * 
= GENERATE NBC ARRAY * 
x * 


CK RK KKK KKK KKK RR KKK KOK KK 


Q 900 


ao 000 0 


SET BOTTOM AND TOP BOUNDARY CONDI TIONS 
DS 52 K=KMN, KMX 


NBC (K, LMN) =NBCD 
NBC(K,LMX) =NBCU 


52 CONTINUE 
SET LEFT AND RIGHT BOUNDARY CONDITIONS 
DS 54 L=LMN,LMX 


NB&CCKMN, L)=NBCL 
NBC (KMX,L)=NBCR 


c 
‘ 54 CONTINUE 


TES TESESE LS SSS SSS SSS SSCS ES ES SS SESE SS tS St SY 


908 
309 


DHWOWOWWOWOMWO 
a et ee ed 
OCOONOCADN—O 


Cx x 
Cx GENERATE COORDINATES AND VELSCITIES : 
Cx 


Cx 
c 
c 
c 


c 


Q 


O00 


c 
Cc 


Cc 
Cx 


Cx 
Cx 
Cx 
Cx 
c 


a 0° 9 


a00 


FIO IO ROO OR IORI IO IO ICR ACK AGI ACR 
INITIALIZE THE MINIMUM AND MAXIMUM VALUES OF R AND Z 


XMINX21,.E+6 
XMAXX=-1.E+6 


YMINX=1.E+6 
YMAXX=-1.E+6 


RP=LMX-LMN 
ZP=KMX -~KMN 


DS 5&8 K=KMN, KMX 
212=10+K-KMN 
DS S57 L=LMN,LMX 
COMPUTE THE COORDINATES R AND Z 


RR=L-2 
Z2Z=(-.5+RR/RP) XPIE 


R(K,L)=2Z21*COS(2ZZ) 
2(K,L)=Z21*SIN(C2ZZ) 


FIND THE MINIMUM AND MA 


XMINX=AMIN1 CXMINX,R 
XMAXX=AMAX1 CXMAXX, R 


i¢ 
( 
YMINX=AMINI CYMINX, 2( 
YMAXX=AMAX1 CYMAXX, ZC 


57 CONTINUE 
58 CONTINUE 


x 
K 
K 
K 
K 


4 
3 
a 
3 


PES SSSSSSSESSSSS SSS S ESE SESE SSE SSS EST TSS SCS ETS SS ESE ES SESS SS 


GENERATE ZONE QUANTITIES RHO, P, E AND COMPUTE AREA 


x 
ROKR OK RRR ROK ROKK ROKK KOKO KKK OK KOK KK KOK CK KKK KK KK KKK KK KKK KKK 


P1D6=1./6. 

DG 65 L=LMNP,LMX 
DO 63 K=KMNP, KMX 
RHO(K, L) =RHOO 
P(K,L3=PO 
E(K,L)=E0 


COMPUTE JACGBIAN 


AJ1=ROK,LI*CZ(K-1,L)-Z(K,L-1)) 
x +R(K,L-1) (20K, L)-Z20K=1,6) 
AJS=R(K-1,L)*«(Z(K-1,L-1)-20K,L-1)) 
x #R(K,L-1) «(Z2(K-4,L)-Z0CK-4,L-1) 
AJ(K,L)=P1D2* (AJ1+AJN3) 

S(K,L) =P1D6x((R(K,L)+R(K-1,L) +R(K, 
x (R(K2L-1) +RCK-1, 0) +RE 


c 
a 63 CONTINUE 
‘ 65 CONTINUE 


CR KOK OK KOK KR ROKK 


Cx 
Cx 
cx 


lolol) 


DEBUG EDIT * 
* 


CHKKKK KKK KKK KER KK 


IF (DEBUG.EQ.0.)G6 TO 80 


PRINT NBC BOUNDARY SENTINELS 


71 


72 


73 


74 


60 


85 


WRITECNS, 71) (NBCCK, LMN) , K=KMN, KMX) 
FORMAT (3HLMN, 8011) 


WRITEC(NO, 72) (NBCCK, LMX) , K@KMN, KMX) 
FORMAT (S3HLMX, 80.11) 


WRITECNO, 73) (NBC CKMN, L), L=LMN, LMX) 
FORMAT (3HKMN, 801 1 


) 
WRITECNG, 74) (NBC(KMX,L), L=LMN, LMX) 
FORMAT (3HKMX, 8OI 7) 


CALL EDIT 
CONTI NUE 

WRITE(NG, 85) 

WRITECNTTY, 85) 
FORMAT(21H GENERATION COMPLETED) 


RETURN 
END 


SER ate Cc ae ee 


RCL Ue ahha eRe bane 
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SPO NSP PP CY Par er OP Ye YY nC 
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CABABABRABRKALADDQDDODQQDONNNANNNNNNN =H Heats a ee 


a000 0 


900 


aang0 


O00 


SUBROUTINE EDIT 


COMMON /KLS/ K,L, DEBUG, VERSION, WHER, WHEN, P1D6,PIE, [GEN,P102 
‘ ED TG ahGe EG. PT EN. KEN, CEN SRE HAAS Peer CONT ENG ENE ENCGANN 


COMMGN /COMN/ 
,E(33, 33), Pcs, 33), AJ(33, 33), $(33,33)) NBC(33, 33) 


x 
x 
¥ , A(33,33) B(33,33), Seo 33), BUM(33, 33), Say 33) 


> DBB( 33, 33), CAP(33,33),$1G6(33, 33), 
COMMON PARAM/ NYCL, TNUP, DTNUP, DTN, DTNPH, DTNMH, EDTIME, EDDT 


xx 


COMMGN /KLSPACE/ KMN,LMN,KMX,LMX,KMX%Z, LMAZ, KMNP, LMNP, KMXP, LMXP 


COMMON /MINMAX/ XMIN, XMAX, YMIN, YMAX,PMIN, PMAX, QMIN, GMAX 
X, RMIN, RMAX,KQ,LQ,KR, LR, KP, LP 


,W(33,33), TEMP (33 


R(33, 33) ,2(33, 34) ,U(33, 33), RHO(33, 33) ,Q(33, 33) 


T$(33; 


M,GAMZ,COF,CIF, Ci: TMAX, DTMAX, DTMIN, TFLR, NGHYD 


G2 P2,P3,N6, NTTY, NED 


x’ , XMINX, XMAXX, YMINX, YMAXX 
COMMON /TIMING/ NBT(20) ,NCT(20),NETC20), NPTC20),NXT(20) 


TEMPIS SUBROUTINE EDITS ALL MESH VARIABLES 
DATA N100/100/ 


INITIALIZE MINIMUM AND MAXIMUM VALUES OF RHS, P, Q@, R AND Z 


INITIALIZE LOCATIGN GF MAXIMUM VALUES OF RHO, P AND Q 


11=0 
RMIN=1.E+6 
RMAX=-1.E+6 
PMIN=1.E+6 
PMAX=-1.E+6 
QMIN=1.E+6 
QMAX=-1.E+6 
XMIN=1.E+6 
XMAX=1.E-6 
YMIN=1.E+6 
YMAX=1.E-6 
KR=0 

LR=0 

KP=0 

LP=0 

KQ=0 

LQ=0 


FIND THE MINIMUM AND MAXIMUM VALUES OF RHO, P, Q@, R AND Z 
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DO 715 L=LMNP, LMX 
DO 714 K=KMNP,KMX 


[FCRHO(K,L).LE.RMAX)GO TO 701 
RMAX=RHOCK, L) 

KR=K 

LR=L 


701 CONTINUE 


IFCP(K,L).LE.PMAX)GGO TO 702 
PMAX=P(K,L) 

KP=K 

LP=L 


702 CONTINUE 
IF(Q(K,L).LE.QMAX)GO TO 703 
QMAX=Q(K,L) 

K@=K 
‘ LQ=L 
703 CONTINUE 

RMIN=AMIN1 (RMIN, 

PMIN=AMINI (PMIN. 

QMIN=AMIN1CQMIN, 

XMIN=AMIN 

x 
N 
x 


— 

wet 
“ 
~ 


XMAX=AMA 


YMIN=AMI 
YMAX=AMA 


714 CONTINUE 
715 CONTINUE 
PRINT PROBLEM PARAMETERS 


WRITE(NG,717) NYCL, TNUP, DTNPH, OTN, VERSION, WHER, WHEN 
717 FORMAT(6H NYCL ,16,6H TIME ,£12.4,7H DTNPH ,E12.4,5H DTN , 
X E12.4,9H VERSION ,F4.1,2A10) 


WRITE(NO,718) PMAX,KP,LP, QMAX,KQ,LQ,RMAX,KR,LR 
718 FGORMATC14H MAXIMUM (K,L),E12.4,214,3H P ,E12.4,214,3H Q , 

X E12.4,214,5H RHO ) 

UVTEST=1.E-5 

KL=KMN 

LL=LMN 

KU=KMX 

LU=LMX 

IFCIGEN.EQ.0) GO TO 720 

EST=-10 


cere ron 


— er ee 
— tl 


UVT . 

PRINT ALL MESH POINTS 

IGEN, NEO WILL RESULT IN EDIT OF ENTIRE MESH,=O ONLY ACTIVE ZONES 
LL=LMN-1 


KU=KMX +1 


ery 


PAGE 


LUSLMX +4 
72Q CONTINUE 
E 


. PEGIN EDIT 


ALON HDSOAnAAKWN-OWUOMNOUAWN 


WOOWOWNNUNANANNNY eae 


fs a ed a es es ce a 


DG 740 Lehi, tu 
WRITECNG, 725) 
725 FORMAT (38H L K, 48, THR, 1OX, THZ, 10%, THU, 10K, 1dwW, 10%, SHRHO, 
x 8X, THE, 10X, THR, 10X, 1H, 10X, 2HAS, 9%, SHTHETA) 
DO 738 K=KL,KU 
IF(CABS(CUCK,L))+ABS(W(K,L))),. LE. UVTESTIGG TO 738 
C DONT PRINT VARIABLES IF NO MOTION ; 
WRITECNS, 726)L,K,RC(K,L),2(K,L),UCK,L),WCK,L),RHOCK,L),E(K,L) 
X,PCK,L),Q(K,L),AJCK,L) , TEMP(K, LL) 
FORMAT(2'4, 10E11.3) 
738 CONTINUE 
CONTINUE 


NETCNEO)=NECOND( 11) 


NPT CNED) =NECOND( 11) 
NED=NED+1 
IFC(NED.GY.20) NED=1 


RETURN 
END 


aa ek ee ek a ke kh ek kk tt kt 
ch et me ee me ee eh lh we eh ee eh eh edt hh ech me me mee ee ae me ed ee eb aed me 
NNOWDAHAIMOOOOOHUIGAUUGUCOUCUGAARDRRAARRAQOQQO 
“ODO DNOUADQN-OKOODNOUAGDN-CODNOCADN-OOONO 
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SUBROUTINE TEMPCAL © ** 2° "5 


COMMGN /EGSCOM/ KESS, TARG1, TARG2, TARG3, RARG1 , RARG2, RARGS, 
xX FUNC, FUNC2,FUNCS, TEMPS, EPS, EPSO 


INVERSE TABLE LOS&K-UP 
DATA P1IM6/1.E-6/ 
TARG1=0. 


Q00 


CALL_IES2 
C E=E6S(0, RHO) 
EPSO=FUNC1 
TEMPS=0. 
IFCEPS.LT.EPSO) RETURN 
RETURN TEMPETA = 0 IF BELOW TABLE 
TEMPS=10. xEPS 
INITIAL GUESS 
10 TARG1=TEMPS 


CALL IES2 


FUNC2=FUNC]) 
TARG1=TARG1+P1M6 


CALL IES2 


DTEMP=P 1M6x ( (EPS-FUNC2) / (FUNC1 -FUNC2) } 
TEMPS=TEMPS+DTEMP 
IFC TEMPS.LT.P1M6) GO TO 20 
IF(ABS(DTEMP).GT.P1IM6) GO TG 14 
CONVERGED 
RETURN 
20 «6TEMPS=O. 
RETURN 
END 


Oo oOo 


NN) 3 a a ss os 
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PAGE 22 
SUBROUTINE JES 


COMMON /EOSCOM/ KEOS, TARG1, TARG2, TARG3, RARG1, RARG2, RARG3, 
xX FUNC1T,FUNC2,FUNC3, TEMPS,EPS,EPSO 


COMMGN /COM2/ NTSV(2),NRSVC2) ,MSV(2),T 
x ,AES(12),BES(12),CES(12),DES(1i2), EES( 
xX ,HES(12),PES(12), ITES(3), IRES(3), 1 2ES 


N=1 
RETURN 
carey IES] 


EXTT=1. 

EXTR=1. 

TARG=TARG1 

RARG=RARG1 

IBOUND=0 

IESTB=1 

Go TS 5000 

110 CONTINUE 

FUNC = AES(M) +RARG* (BES ( 
1 +TARG* (CES (M) +RARG* (FES 
2_+TARG* (EES(M) +RARG* (HES 
FUNC1=FUNCXEXTTXEXTR 


RETURN 
c IES2 ENERGY=FUNCTIONCTEMPETA RHO) 
ENTRY IES2 
N=2 
EXTT=i. 
TARG=TARG1 


RARG=RARG1 
IBOUND=0 


a= mn 


) +RARG*DES ¢ 
M) +RARG*GES 
Mm) Ss 


c 
(M) +RARG*PE 


)> 
M)) 
M))>) 


210 CONTINUE 
FUNC = AES(M) +RARG* (BES ( 
1 +TARG*x (CES(M) +RARG* (FES 
2_+TARG* (EES (M) +RARG* (HES 
FUNC1=FUNCKEXTT 
RETURN : 
c TABLE LOOK UP 


c 
S000 NT=NTSV(N) 
NR=NRSV(N) 


)+RARG*DES¢ 
M) +RARG*GES 
M) Ss 


¢ 
(M) +RARG*PE 


)> 
M)) 
M}) 3) 


MLR = 
S MLT = 0 
IF(TES(NT).GT.TARG) GO TO 35100 
é IFCTES(NT+1).LE,.TARG) GO TO 5200 
5 TARG IN SAME T STRIP AS FOR PREVIGUS ENTRY 
IF CRES(NR) .GT.RARG) GO TO 5300 
" IFCRES(NR+1).LE.RARG) GO TO 5400 
c TARG AND RARG IN SAME BOX_AS FOR PREVIOUS ENTRY 
a M SAME AS FOR PREVIOUS ENTRY,FAST RETURN 


1250 


5100 


N00 AMNHOOO 


5105 


Q000 900 


5115 


oO 


5120 


o000 oan 


5200 


200 000 0090 


5205 


oO 09020 


eee 110,210) , LESTB 
T SEARCH 
TARG BELOW T STRIP GF PREVIGUS ENTRY 
GUT OF TABLE TEST, LOW T 
IFCNT.LE.ITES(N))} GO TO S115 


SEARCH TG NEXT LOWER T STRIP 


NT=NT~-1 
IFCTES(NT).GT.TARG) GO TO 5120 


STRIP CONTAINING TARG FOUND, BEGIN R SEARCH 
IF CRES(NR) -RARG) 5410,5310,5320 


TARG BELGW LOWEST TABLE ARGUMENT AND 
WAS BELOW TEMPAT ARGUMENT ON PREVIOUS ENTRY 


MLT=~1 
EXTT=EXTT*TARG/TES(NT) 
TARG=TES(¢NT) 
IF CRES(NR).GT.RARG) GO TOS 5300 
IFCRES (NR+1).LE.RARG) GO TO 5400 
M = _MSV(N) 
G6 TS (110,210) , 1ESTB 
GUT OF TABLE TEST, LOW T 
IFCNT.GT.ITES(N)) GO TS 5105 


TARG BELOW LOWEST TABLE ARGUMENT BUT 
WAS NOT BELOW TEMPAT ARGUMENT GN PREVIOUS ENTRY 


Pade eaTTgnO een 

BEGIN R SEARCH 
IF CRES (NR) -RARG) 5410, 5310, 5320 

GUT OF TABLE TEST, HIGH T 
IFCNT-ETES(N+1)+2) 5205,5215, 5205 

SEARCH TG NEXT HIGHER ¥ STRIP 


NT=NT+1 
IFCTES(NT+1).LE.TARG) GO TO 5220 


STRIP CONTAINING TARG FOUND, BEGIN R SEARCH 
IF CRES CNR) -RARG) 5410, 5310, 5a20 


PAGE 
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‘ 
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TARG ABOVE HIGHEST TABLE ARGUMENT AND 
WAS ABOVE TEMPAT ARGUMENT ON PREVIOUS ENTRY 


ao00 


$215 MLT=1 
EXTT=EXTTXTARG/TES(NT#+1) 
TARG=TES(NT+1) 


IF(RES(NR) .GT.RARG) G6 TO 5300 
IF CRES(NR+1). LE.RARG) GO TO 5400 
G6 TO (110,210) ,1ESTB 


GUT OF TABLE TEST, HIGH T 
5220 IF(NT-ITES(N+1)+2) 5205, 713,5205 


TARG ABGVE HIGHEST TABLE ARGUMENT BUT WAS 
NOT ABOVE TEMPAT ARGUMENT GN PREVIOUS ENTRY 


713 MLT=1 
EXTT=EXTT*TARG/TES(NT+1) 
TARG=TES(NT#1) 


Cc 

. BEGIN R SEARCH 

5 IF CRES(NR) -RARG) 5410, 5310, 5320 

c GOUT OF TABLE TEST, LOW R 

areas IFCNR.GT.IRES(N)) GO TO 5305 

c RARG BELOW LOWEST TABLE ARGUMENT BUT WAS 

6 NOT BELOW TEMPAT ARGUMENT GN PREVIGUS ENTRY 
MLR=-1 
EXTR=EXTR*RARG/RES (NR) 
RARG=RES (NR) 

" GS TS 5310 

Cc R SEARCH 

c RARG BELOW R STRIP OF at ENTRY 

. GUT OF TABLE TEST, LOW 

Pees [FCNR.GT.IRES(N)) GO TO 5305 

c RARG BELOW LOWEST TABLE ARGUMENT AND 

S WAS BELGW TEMPAT ARGUMENT ON PREVIOUS ENTRY 


MLR=-1 
EXTR=EXTR*RARG/RES (NR) 
RARG=RES (NR) 

M = MSVC(N) 

GO T6 (110,210) ,TESTB 


SEARCH TO NEXT LOWER R STRIP 


a00 


5305 NR=NR- 
TECRES(NR) - RARG) 5310,5310, 5320 


51 Cc BOX CONTAINING TARG AND RARG FOUND, COMPUTE NEW M 


c 
5310 


an0n0n0 lololeo mE elele) 


9000 


5400 


5405 
5410 


719 


5415 


M=IZES(N)+CIETES(N4+1)-1 TESON) -1)* (NR-TRESCN) )+NT-ITESCN) 
NTSVON} =NT 
NRSV(N)=NR 
MSV(N)=M 
G6 TS (110,210) ,fESTB 
GOUT SOF TABLE TEST, HIGH R 
IFCNR - ERES(N+1)4+2) 5405,5415, 5405 
SEARCH TO NEXT HIGHER R STRIP 
NR=NR+1 
IF CRES(NR+1).GT.RARG) GO TS 5310 
IF (NR-IRES(N+1)#3) 5405, 5405, 719 
RARG ABOVE HIGHEST TABLE ARGUMENT BUT WAS 
NOT ABOVE TEMPAT ARGUMENT GN PREVIOUS ENTRY 
MLR=1 


EXTR=EXTRXRARG/RES (NR*1 ) 
RARG=RES (NR+1 ) 
GO TO 5310 


RARG ABOVE HIGHEST TABLE ARGUMENT BUT 
M SAME AS ON PREVIOUS ENTRY 


MLR=1 
EXTR=EXTR*RARG/RES (NR+1 ) 
RARG=RES(NR+1 ) 


M = MSV(N) 
GO TO (110,210) ,lIESTB 
END 
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SUBROUTINE SETUP 


COMMON /COM2/ NTSV(2),NRSV(2) ,MSV(2}, TES(7) , RES(9) 
x ,AES(12),BES(12),CES(12),DES(12),EES(12),FES(12),GES(12) 
X ,HES(12),PES(12), 1TES(3), IRES(3), IZES(3) 


LL JES 
ata area LAW GAS EQUATION SF STATE FOR BIQUAD ROUTINE 
NRSV(1)=1 
MSV(1)=1 
NTSV(2)=4 
NRSV(2)=5 
MSV(2)=7 
[TES(1)=1 
PRES(1)=1 
IZES(1)=1 
ITES(2)=4 
IRES(2)=5 
IZES(2)=7 
ITES(3)=7 
IRES(3)=9 
IZES(3)=13 
TES( 1) = .OE+00 
TES( 2) = 1.,Q000E+00 
TES( 3) = =1,Q000E+02 
TES( 4) = .OE+00 
TES( 5S) = 1.0000E+00 
TES( 6) = 1.Q0000E+02 
TES( 7) = .OE+00 
RESC€ 17 = .OE+00 
RES( 2) = 3.0Q0G0E+00 
RES( 3) = 3.Q000E+02 
RES( 4) = 3, Q000E+10 
RES( 51] = .OE+00 
RES( 6) = 3.Q0Q0E+00 
RES( 713 = 3.QQ00E+02 
RES( 8) = 3.0Q000E+10 
RES( 9} = .OE+00 
AES( 1) = .OE+00 
BES( 1} = .OE+00 
CES( 1) = .O0E+00 
DES( 1) = .OE+00 
EES( 1) = , OE+00 
FES( 1) = 6.6667E-02 
GES( 1) = -1.2953E-16 
HES( 1) = ~4,.4409E-16 
PES( 1) = -9.2519E-17 
AES( 2) = .0E+00 
BES( 2) = -4. 7184E- 16 
CES( 2) = Lele) 
DESC 2) = -1. 71466 - 16 
EES( 2) = .OE+00 
FES( 2) = 6.6667E-02 
GES( 2) = -4.8247E-17 
HES( 2) = 1.0408E-17 
PES( 2) = -2.3426E-18 
AES( 3) = .OE+00 
BES( 3) = -OE+00 
CES( 3) = -8.0183E-17 


27 
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SUBROUTINE PROJCT 


THIS SUBROUTINE REFLECTS AN INTERIOR POINT ACROSS THE BOUNDARY 


COMMON /PROGG/RO, Z0,R1,21,RP,2P,RR, 22 


REFLECT (RP,ZP) TG (RR, 22) 
WHERE (RO,Z0) AND (R1,21) ARE BOUNDARY POINTS 


WW= (2, *(Z1-20))/CCR1~-RO) kx2+(21-2Z0) ¥x2) 
ALP=1.-(21-Z0) xWW 

BET=(R1-RO) xWWwW 

RR=RO+(RP-RO)XALP + (ZP-20)*BET 
Z2Z=ZO+(RP-RO)*BET - (ZP-Z20) *xALP 


RETURN 
END 
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SUBROUTINE CONDUCT 


COMMON /KLS/ K,L, DEBUG, VERSION, WHER, WHEN, P1D6,PIE, IGEN, P1D2 
x ,DTC, KC, LC, DTEN, KEN, LEN, SKE, HN, STEL, CNN, ENC, ENH, ENCG, WN 


COMMON /COMN/ R(33,33) ,2(33, 33) ,U(33,33),RHO(33, 33) ,Q(33, 33) 


E(33, 33), P(33, 33) ,AJ(33, 33), $(33, 33) , NBC(33, 33) 
w(33,3 P(33, 33) ; 


x 
x ; 
x , A(33) 33), BC CC( 33, 33) , DUM(33, 33), CBB(33, 33) 


3), TEM 
33,33), 


* pBB(33, 33), CAPC 


33,3 


3), $1634, 33) ,T$(33; 33) 


COMMON /PARAM/ NYCL, TNUP, DTNUP, DTN, DTNPH, DTNMH, EDTIME, EDDT 
x , GAM, GAMZ, COF. C1F, 61, TMAX, DTMAX, DTMIN, TFLR, NOHYD 
x ;C2,P2,P3,N6,NTTY, NED . 
COMMON /KLSPACE/ KMN,LMN,KMX,LMX,KMMZ,1LMXZ,KMNP, LMNP, KMXP, LMXP 


COMMON /EOSCOM/ KEOS, TARG1, TARG2, TARG3, RARG1, RARG2, RARGS, 
x FUNC1,FUNC2, FUNC3, TEMPS, EPS, EPSO 


ELECTRON CONDUCTION -LU- 


DS 10 L=LMN,LMX 
DO 10 K=KMN, KMX 
CAP(K,L)=.1 ; 
CC(K,L)=¢ .0001 *SQRT(TEMP(K, L)) xTEMPCK,L)**x2)/AJ(K,L) 
SIG(K,L)=DUM(K,L) *CAP(K,L) /DTNPH 
TS(K,LI=STEMP(K,L) 
CONTINUE 
DO 12 L=LMN,LMX 
DO 12 K=KMN,KMXZ 
CBB(K,L)=(2, xCC(K+1,L)*CO(K+1,L 41) )ACCC(K+1, L) +CC(K+1,L41)) 
X « (.5e(R(K,L)tRCK+T, 0) x CCRCKE1, LE) -RCK,L) 2 d 
x #(Z(K4+1,L)-20K,L))**2) ) 
12 CONTINUE 
DO 14 L=LMN, LMXZ 
DS 14 K=KMN,KMX - 
DBB(K,L)=(2. xCC(Kt1, L+1) *CC(K, Lt+1) A (CC(K+1,L4+1)+CC(K,L+1)) 
xX * €.$*(R(K,L) +RCK, L417) CORCK, L#1)-RCK,L) ) x2 
x +(Z(K,L4+1)-Z2(K,L))**2) ) 


14 CONTINUE 


BOUNDARY CONDITIONS 

~~ D6 17 L=LMN, LMX 
ACKMN,L)=0, : 
B(KMN, L)=TEMP(KMN, L) 
DBB(KMN, L)=0. 

17. CONTINUE 


DO 19 K=KMN, KMX 
A(K,LMN)=0. 

B(K,LMN) =TEMP(K, LMN) 
CBB(K, LMX)=0. 
Sie 


19 NTINUE 
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DO 53 K=KMNP,KMX 

DO 51 LSLMNP:LMXx 
SIG(K,L)+CBB(K-1, L) *CBB(K-1,L-1)*(1. -A(K,L-1)) 

B(K-1.L)/DUM(K,L) 

I P(K:L) 


G(K,L)*TEM +CBB(K-1,L-1)4B(K,L-1) 


ae 


xX )/ 
51 CONTINUE 
Cia ALPHA,BETA FORWARD 
ML=LMX+1 
DG 52 L=LMNP, LMX 
ML=ML~1 
TEMP (K,ML)=ACK,ML) *TEMP(K,ML+1)+B<K,ML} 
S2 CONTINUE 
C BACK SUBSTITUTION 
53 CONTINUE 


Cc 
Code teito tains Z SWEEP END 
Catan ... R SWEEP 


DS 43 L=LMNP,LMX 
DO 41 K=KMNP,KMX 


DUM(K,L)=SIG(K, L)+DBB(K, L-1)+DBB(K-1,L-1)%(1.-A(K-1,L)) 
A(K,L)=DBB(K, L-1) /DUM(K.L) 
B(K,L)=(SIG(K, L) xTEMP(K,L)+DBB(K-1,L-1)*B(K-1,L) 
X )/OUMCK,L) 
41 CONTINUE 
..... ALPHA BETA FORWARD SWEEP 
ML=KMX+1 
DO 42 K=KMNP, KMX 
ML=ML-1 
TEMP (ML,L)=A(ML,L)*TEMP(ML+1,L)4+BCML,L? 
CONTINUE 


42 
C BACK SUBSTITUTION R DIRECTION 
43 CONTINUE 


Cc 
Cae ras R_ SWEEP END 
. COMPUTE DT CONTROL FOR HEAT CONDUCTION 


YE=0. 
KEN=0 
LEN=0 
DO 111 L=LMNP,LMX 
DO 111 K=KMNP, KMX 
C GET NEW ENERGY 
ENH=ENH+E(K,L) «RHOCK,L) *SCK,L) 


TARG1=TEMP(K,L) 
RARG1=RHO(K,L) 
CALL IES2 


E(K,L)=AMAX1(FUNC1,1.E-30) 

ENC=ENC+E(K,L) *RHOCK,L) xS(K, Lb) 
IFCTS(K,L).EG.0.) G6’ TO 109 
TEMPR=ABS( ( TEMP(K,L)-TS(K,L))/TS(K,L)) 
IFCTEMPR.LE.YE) G6 Te 109 


YE=TEMPR 
KEN=K 


10 
11 


© 


(YE.EQ.0,) GO TO 118 
DTEN=(. 10 TNPH) /YE 
118 CONTINUE 
ENERGY BALANCE HN 
05 122 K=2,KMX 
HN=HN*DTNPH*CBB(K-1,L 
x +DTNPHxCBB(K-1,LMX) 
122 CONTINUE 


DO 124 L=2 


MN) 
x 


x (TEM 
(TEMP CK, 


CK, LMN) -TEMP(K,LMN+1)) 
LMX+1)-TEMP (CK, LMX)) 


LM 
HN= NASD TNPHS DBR CKMN, L-1)* CTEMPCKMN,L) - TEMP CKMN+17,L) ) 
x +DTNPH*DBB(KMX, L- 1) * (TEMP (KMX+1,L)-TEMP(KMX,L)) 


124 CONTINUE 


RETURN 
END 
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SUBROUTINE HWORK 


COMMON /KLS/ K,L, DEBUG, VERSION, WHER,WHEN,P1D6,PIE, IGEN, P1D2 
x ,OTC,KC,LC, DTEN, KEN, LEN, SKE, HN, SIEL, CNN, ENC, ENH, ENCG, WN 
a 


COMMON /COMN/ R(33,33),Z(33, 34) ,U(33, 33), RHO(33, 33) ,Q(33, 33) 
Xx E(33, 333 P(33 33), AJ(33, 33), $(33, 33), NBC(33, 33) 
x ,W(33, 33) | TEMP(33, 33) 

X , AC33)33),B(33; 33) ,CC(33, 33), DUM(33, 33), CBB(33, 33) 
X | DBB( 33, 33), CAP(33, 33), $iG(34, 33) ,T$(33, 33) 


COMMON /PARAM/ NYCL, TNUP, DTNUP, DTN, DTNPH, DTNMH, EDTIME 
x , GAM, GAMZ, COF, CIF, C1, TMAK, DTMAX, DTMIN, TFLR, NOHYD 
x 'C2,P2,P35N6, NTTY, NED 
COMMON /KLSPACE/ KMN,LMN, KMX, LMK, KMXAZ, LMAZ, KMNP, LMNP, KMXP, LMXP 
SUM THE HYDRO WORK ON THE BOUNDARY 
Z1=DTNPH/8. 


DG 510 K=KMNP, KMX 


EDDT 


WN=WN+2714(P(K, LMN+1)+P(K, LMN) +Q(K, LMN+1) +Q(K, LMN) ) 
x «CC CUCK,LMN) 4UCK-1, LMN) )x(ZC(K, LMNI-Z(K-1,LMN) ) 

x - (WCK, LMN) +WCK-1, DMN) )x(RCK,LMNI-RCK-1.LMN) ) 

X  )*CRCK,LMN) +R(K-1,LMN)) 

WN=WN-214(P(K, LMX+1) +P(K, LMX) +@(K, LMX+1) +Q(K, LMX) ) 
X *C CUCK, LAX) 4UCK-1, LMX) (20K, LMXI-ZCK-1, LAX) 

x - (WCK, LMX) +WCK-1,LMX) )eCRCK, LMX}-RCK-1,LMX)) 

“ X Ix (RCKLEMX) +R(K~1,LMX) ) 
510 CONTINUE 

DO 515 L=LMNP, LMXx 
WN=WN+Z14(P (KMN41, L)+P(KMN, L) +Q(KMN41,L)+Q(KMN,L)) 
xX x*C€ CUCKMN, L) +UCKMN, L-1))*(ZCKMN, L3-2CKMN,L-1)) 

x - (WCKMN, LL) +WCKMN.L-1))*(RCKMN,LI-RCKMN,L-1)) 

X  >*CRCKMN, L)+ROKMN.L-1)) 
WN=WN-Z1%(PCKMX41,, L) +P CKMX, L.) +QCKMX41,L) +Q(KMX,L)) 
xX #C CUCKMX,L)+UCKMX,L-1))®CZCKMX, LI-2CKMX,L-1)) 

x - (WCKMX; L) +WCKMX, L- 1) « CRCKMX, L-RCKMX, L-1)) 

X x CRCKMX;L) +ROKMX,L-1)) 


c 
515 CONTINUE 


RETURN 
END 
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FUNCTIGN NECGOND(IARG) 
TAG=0 


AA1=SECGND( IAG) 
NECOND= (AA1~-AA2) *1.E+6 
AA2=AAi 

RETURN 

END 
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